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Coanunication  concentrators  perform  the  basic  network  function  of  merging 
many  input  flows  into  a single  output  flow.  This  requires  formating  the  data 
and  encoding  side  information  about  when  messages  start,  what  their  lengths 
are  and  what  their  origins  and  destinations  are. 


This  thesis  examines  efficient  ways  of  performing  these  functions,  the 
objective  being  to  minimize  the  average  message  delay,  or  some  other  queueing 
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thsorstic  quantity,  liXa  the  probability  of  buffer  overflow. 

The  work  is  divided  in  four  parts ^ 

encoding  of  the  data; 

V.V  encoding  of  message  lengths; 

encoding  of  message  starting  timesj; 
encoding  of  message  origins  and  destinations. 

With  respect  to  data  encoding,  em  algorithm  is  given  to  construct  a prefix 
condition  code  that  minimizes  the  probability  of  buffer  overflow. 

Next  a theory  of  variable  length  flags  is  developed  and  applied  to  the 
encoding  of  message  lengths. 


For  concentrators  with  synchronous  output  streams,  it  is  shown  that  the 
concept  of  average  number  of  protocol  bits  per  message  is  meaningless.  Thus, 
in  order  to  analyze  the  encoding  of  message  starting  times,  a class  of  flag 
strategies  is  considered  in  \daich  there  is  a tradeoff  between  delay  and  low 
priority  traffic. 

The  problem  of  encoding  message  origins  aind  destinations  is  attacked  from 
two  different  points  of  view.  Some  strategies  (variations  of  the  polling 
scheme)  are  analyzed  amd  shown  to  be  much  more  efficient  in  heavy  traffic  than 
just  using  a header,  as  is  usually  done.  A simplified  model  is  also  developed. 
Its  analysis  suggests  that  there  exist  strategies  to  encode  message  origins 
and  destinations  that  aire  much  more  efficient  than  everything  considered  until 
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ABSTRACT 

Communication  concentrators  perform  the  basic  network  function  of 
merging  many  input  flows  into  a single  output  flow.  This  requires 
formating  the  data  auid  encoding  side  information  about  when  messages 
start,  what  their  lengths  are  and  what  their  origins  amd  destinations 
are. 

This  thesis  examines  efficient  ways  of  performing  these  functions, 
the  objective  being  to  minimize  the  average  message  delay,  or  some  other 
queueing  theoretic  quantity,  like  the  probability  of  buffer  overflow. 

The  work  is  divided  in  four  paurts: 
encoding  of  the  data; 

- encoding  of  message  lengths;' 

- encoding  of  message  starting  times; 

- encoding  of  message  origins  amd  destinations. 

With  respect  to  data  encoding,  an  algorithm  is  given  to  construct 
a prefix  condition  code  that  minimizes  the  probability  of  buffer  overflow. 

Next  a theory  of  vamiad^le  length  flags  is  developed  and  applied  to 
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the  encoding  of  message  lengths. 

For  concentrators  with  synchronous  output  streams,  it  is  shown 
that  the  concept  of  average  number  of  protocol  bits  per  message  is 
meaningless.  Thus,  in  order  to  analyze  the  encoding  of  message  starting 
times,  a class  of  flag  strategies  is  considered  in  which  there  is  a 
tradeoff  between  delay  and  low  priority  traffic. 

The  problem  of  encoding  message  origins  amd  destinations  is 
attacked  from  two  different  points  of  view,  some  strategies  (variations 
of  the  polling  scheme)  are  analyzed  and  shown  to  be  much  more  efficient 
in  heavy  traffic  than  just  using  a header,  as  is  usually  done.  A 
simplified  model  is  also  developed.  Its  analysis  suggests  that  there 
exist  strategies  to  encode  message  origins  and  destinations  that  are 
much  more  efficient  than  everything  considered  until  now. 
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1.  Introduction 


The  last  decade  has  seen  a tremendous  development  of  computer 
networks.  Miomerous  books  and  paperr describing  and  analyzing  systems 
have  appeared  (see  Section  2) . 

From  the  operations  research  point  of  view,  the  most  studied 
problems  are  those  of  modelling  the  queueing  phenomena  in  the  net- 
works, of  routing  the  messages  so  as  to  minimize  some  cost,  usually 
the  average  message  delay,  and  of  laying  out  the  network  in  some 
optimal  fashion. 

Computer  scientists  have  been  concerned  with  the  architecture 
of  the  computers  in  the  nodes,  and  with  the  protocol,  i.e.  control 
messages  exchanged  between  subsystems  of  the  network.  This  is  related 
to  the  problems  associated  with  distributed  computation. 

Presently  the  most  important  consideration  in  the  design  of 
protocols  is  to  get  a working  system  where  no  deadlock  can  occur. 
Little  attention  has  usually  been  paid  to  the  effects  of  the  overhead 
produced  by  the  protocol  on  the  performance  of  the  network.  However, 
taking  a queueing  theorist  view  of  the  problem,  ^Kleinrock  et  al., 
1976]  pointed  out  that  the  effect  was  significant  in  the  ARPANET, 
jcallager,  1976^  showed  that  information  theory  can  be  used  to 
produce  basic  lowerbounds  on  some  of  the  information  that  is  carried 
in  the  protocol  messages. 


12 

Our  goal  is  to  obtain  results  similar  to  those  of  Gallager, 
but  under  less  restrictive  hypotheses.  In  particular,  we  will  not 
assume  an  infinite  number  of  sources  and  links  of  infinite  capacity. 

Thus  we  will  take  into  account  queueing  effects  and  interactions 
between  sources.  One  will  find  in  this  work  concepts  and  methods 
from  the  fields  of  queueing  theory  on  one  hand,  and  information  and 
coding  theories  on  the  other. 

We  do  not  plan  to  solve  at  once  all  the  protocol  problems  in 
a complete  network.  Instead,  we  pay  attention  only  to  the  nodes,  i.e. 
the  points  in  the  network  where  different  links  join  each  other.  From 
our  point  of  view  a node  can  be  decomposed  in  a "router"  followed  by 
"concentrators"  Csee  Figure  1.1). 

The  role  of  the  router  is  to  determine  the  destination  of  each 
input  bit  and  to  send  it,  together  with  some  associated  information  to 
be  described  later,  to  the  corresponding  concentrator.  The  concentra- 
tors merge  the  many  input  flows  into  one  output  flow. 

We  will  not  consider  the  structure  or  the  optimization  of  the 
router,  instead  we  will  regard  it  as  a source,  with  known  statistics, 
to  the  concentrators. 

Because  their  input  is  generally  stochastic,  concentrators 
contain  a buffer  in  which  queueing  phenomena  occur.  In  addition  to 
transmitting  the  data  they  received,  concentrators  usually  perform 
other  duties; 

1°  they  reformat"  the  data.  This  may  involve  translating  characters 
from  one  code  to  another,  merging  packets  into  messages  or 
dividing  messages  into  packets. 


Illllllll 


Figure  1.1 


Decomposition  of  a Node  into 
a Router  and  Concentrators 


20 


they  transmit  service  information  to  the  downstream  nodes; 
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- information  about  the  line  being  idle  or  not; 

- information  edx>ut  the  origin  and  destination  of  the  data. 

3*  they  perform  some  kind  of  error  control,  typically  implementing 
em  error  detection  and  retransmission  system  in  conjxinction  with 
sending  error  detecting  parity  bits  to  the  downstream  node. 

4®  they  send  flow  control  information  to  the  upstream  nodes  and/or 
the  router  indicating  that  they  are  un^d3le  in  some  way  to  handle 
the  flow  of  data. 

We  will  consider  in  this  work  only  the  first  two  functions;  they 
^u:e  related  to  what  information  theorists  call  "source  coding,"  whereas 
the  third  one  is  more  like  "channel  coding."  The  fourth  function  should 
be  studied  with  the  routing  question  and  is  not  touched  here. 

Note  that  classical  source  coding  theory  is  interested  in 
transmitting  as  little  redundancy  as  possible.  In  computer  networks 
the  goal  is  usually  to  minimize  the  average  message  delay.  These  two 
objectives  are  not  always  compatible,  as  we  shall  see. 

Note  at  this  point  that  we  consider  all  higher  level  protocol 
messages,  like  "end  to  end"  messages  to  set-up  a "session,"  and  like 
flow  control  and  routing  messages,  as  regular  data  that  must  be  trzms- 
mitted  by  a concentrator  to  emother  node,  together  with  information 
about  its  origin,  destination  wd  some  error  check. 

The  plan  of  this  thesis  is  the  following:  in  Section  2 of  this 
chapter,  we  will  review  previous  works  of  interest  while  we  present  in 
Section  3 an  outline  of  the  original  contributions  of  this  work.  The 
next  four  chapters  describe  in  det2il  the  actual  results.  They  are 


organized  as  follows: 


! 
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In  Chapter  2,  we  examine  how  the  concentrator  should  encode  the 

1 data  so  as  to  minimize  in  some  sense  the  message  delays, 

f 

I In  practical  systems  the  data  are  often  transmitted  in  batches, 

5 

called  "packets"  or  "messages."  We  analyse  in  Chapter  3 a very  efficient 
! way  of  encoding  the  length  of  these  batches.  This  will  introduce  an 

encoding  technique,  using  flags,  which  will  be  used  extensively  in  the 
next  two  chapters. 

In  Chapter  4,  we  study  efficient  ways  of  solving  a seemingly 
trivial  problem:  how  should  a concentrator  indicate  to  the  downstream 
node^when  it  transmits  idle  bits.  This  simple  problem  will  introduce 
some  conceptual  difficulties  that  appear  more  strongly  in  Chapter  5. 

Chapter  5 treats  the  problem  of  encoding  message  origins  and 
destinations.  It  has  two  distinct  parts:  in  the  first  part  we  use  a 
simplified  model  to  see  what  issues  are  involved.  In  the  second  part 
I we  e.xamine  and  compare  different  practical  strategies  for  encoding  the 

origins  and  destinations  while  not  degrading  too  much  the  average 
message  waiting  time. 
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2.  Review  of  Previous  Wbrks 

We  re^idly  review  previtxis  works  of  interest,  considering 
mainly  works  that  give  general  ideas  rather  than  technical  details. 

These  last  references  are  mentioned  in  the  text  as  they  are  needed. 

Should  a reader  need  general  information  etout  cotputer  networks, 
the  boc^  of  [Davies  and  Barber,  1973] , [Abramson  and  aio,  1973]  and 
[Schwartz,  1977]  are  valuable. 

[IQeinzock,  1976]  is  an  exosllent  reference  on  queueing  models 
for  oorputer  systems,  vhile  [Gerla  and  KLeinrock,  1977]  present  an  over- 
view of  the  problems  of  optimal  static  routing  and  network  layout  ani 
give  a nurber  of  references.  The  subject  of  adaptive  routing  and 
nunerous  references  on  related  subjects  appear  in  [Segall,  1977]  while 
[GedJLager,  1977]  offers  an  actual  adaptive  decentralized  loopfree  eilgo- 
ritlia. 

Many  of  the  idea.s  used  in  high  level  protocols  today  were  bom 
during  the  development  of  the  ARPANET!;  suitable  references  ^u:e  [Crocker, 

1972] ,  [Cerf,  1977],  [Kleimrock,  1976]  and  [Kleinrock  and  Opderbeck, 
1977]. 

Of  course  the  ARPANET!  is  well  known  for  sending  data  in  packets. 
Another  netvcrk  that  functions  in  a similar  way  is  the  CTCIADES , [Pouzin, 

1973]  . Seme  networks  do  not  use  this  idea,  but  trananit  the  data 
character  by  character,  e.g.  see  [Tymes,  1971]  and  [Vander  Mey,  1976] . 

The  references  just  mentioned  describe  the  background  of  this 
thesis,  but  have  no  direct  irpact  on  it.  We  now  review  seme  works 

^ . 


that  have  a stronger  relation  to  it. 

The  motivating  paper  behind  this  thesis  is  the  one  by  [Gallager, 
1976]  which  showed  that  there  is  a trade  off  between  the  delay 
incurred  by  a message  and  the  amount  of  information  necessary  to 
indicate  its  origin  or  destination.  However,  the  delay  there  is  a 
"voluntary"  delay  in  the  sense  that  the  concentrator  sometimes  chooses 
not  to  send  a message  although  the  line  is  available.  We  will  examine 
how  "involuntary"  queueing  delays  can  be  exploited  to  minimize  the 
amount  of  protocol. 

Another  paper  along  these  lines  is  [Rubin,  1976] . Rubin  notes 
that  if  some  rate-distortion  function  exists  for  the  output  of  a source, 
and  if  the  output  of  the  source  encoder  is  sent  over  a link  for  which  a 
relation  exists  between  rate  and  average  delay,  one  can  obtain  a delay- 
distortion  relation.  This  approach  is  not  very  useful,  because  it 
neglects  the  delays  added  by  the  coding  process  and  it  assumes  that  the 
average  delay  on  the  link  is  only  a function  of  the  rate,  and  not  of 
other  parameters  of  the  coder  output  statistics.  It  is  an  unfortunate 
fact  that  infcrmation  theory  is  concerned  only  with  rate. 

A work  that  has  a strong  relation  with  this  thesis  is  the  one  by 
[Jelinek,  1968]  and  [Jelinek  and  Schneider,  1972] . They  were  the  first 
to  show  that  a code  with  minimal  redundancy  is  not  necessarily  optimal 
as  far  as  buffering  problems  are  concerned.  We  will  use  some  of  their 
ideas  and  extend  their  results  in  Chapter  2. 


3.  Outline  of  Original  Contributions. 


18 


The  goal  of  this  thesis  is  to  find  efficient  ways  for  a 
concentrator  to  perform  the  source  coding  functions  described  in  Section 
1,  and  divided  in  four  main  sections: 

- encoding  of  the  data; 

- encoding  of  the  message  lengths; 

- encoding  of  the  idle  times; 

- encoding  of  the  message  origins  emd  destinations. 

The  objective  is  to  minimize  the  average  message  delay,  or  some  other 
queueing  theoretic  quantity,  like  the  probability  of  buffer  overflow. 

We  review  briefly  our  contributions  in  these  fields. 

In  Chapter  2,  we  present  an  algorithm  to  construct  a prefix 
condition  code  minimizing  the  probability  of  buffer  overflow.  It  is  a 
generalization  of  Huffman's  procedure. 

Variable  length  flag  strategies  are  studied  exhaustively  in 
Chapter  3.  We  give  coding  emd  decoding  algorithms  using  flags,  analyze 
their  performance  and  sensitivity,  and  identify  the  classes  of  flags  that 
have  some  desirable  properties.  The  main  result  is  that  if  well  chosen 
flags  are  utilized  to  encode  the  length  of  a message,  the  expected  number 
of  bits  used  is  upperbounded  by  the  entropy  of  the  distribution  of  the 
message  length  + .S&  . 

We  study  in  Chapter  4 how  to  encode  the  message  starting  times  to 
minimize  the  average  message  delay.  Unfortunately  the  best  way  of  doing 
this  is  still  unJcnown.  We  were  only  able  to  show  that  the  concept  of 
average  number  of  protocol  bits  per  message  is  useless  when  the  line  is 
synchronous.  We  also  analyzed  a practical  strategy,  using  flags,  to 
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encode  the  starting  times.  This  is  a veiriation  on  the  theme  of  the 
M/G/1  queue . 

Our  main  contributions  aire  in  Chapter  5,  where  we  study  the  encod- 
ing of  the  message  origins.  We  first  introduce  a simplified  model  where 
the  objective  is  to  minimize  the  entropy  of  the  sequence  of  the  origins 
of  the  messages  being  transmitted.  We  also  show  that,  at  least  for  this 
model,  the  traditional  methods  (e.g.  forming  packets  or  polling)  are 
far  from  being  optimal.  We  give  a lowerbound  on  the  best  achievable 
performance  and  show  how  dynamic  programming  cam  be  used  to  f'nd  the 
optimal  strategy. 

We  also  analyze  four  practical  strategies  to  encode  the  origins. 
They  are  based  on  well-known  queueing  strategies.  Our  main  contributions 
are  a closed  form  expression  for  the  waiting  time  in  cyclic  queues  with 
symmetric  inputs,  and  a fast  algorithm  to  compute  the  waiting  times  in 
the  asymetric  case.  We  also  solved  the  problem  of  optimal  source 
coding  for  an  integer  alphabet  with  Poisson  distribution. 


Chapter  2 

Source  Coding  to  Minimiie  Delay 
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1 . Introduction 

We  devote  this  chapter  to  the  problem  of  source  coding  to  mini- 
miie delay.  After  presenting  our  model  in  Section  2,  we  consider  briefly 
in  Section  3 how  to  find  a code  minimizing  the  average  delay.  The 
problem  of  minimizing  the  probability  of  large  delays  or  of  buffer  over- 
flows is  treated  in  Section  4.  Finally,  we  review  and  generalize  in 
Section  5 the  work  of  ^Jelinek  and  Schneider,  1972J,  which  is  stronly 
related  to  the  topic  of  this  chapter. 

2.  The  Model 

We  propose  the  following  model:  an  asynchronous  memoryless 
source  emits  symbols  drawn  from  the  alphabet  {1,2,3, ...  ,c}  ; symbol 
i has  probability  p^^  . The  time  intervals  between  two  source  emissions 
are  independent  random  variables  with  distribution  function  A . An 
encoder  maps  the  source  symbols  into  codewords  which  are  placed  in  an 
output  buffer  of  size  M from  which  one  letter  is  removed  every  unit 
of  time  (first  in,  first  out).  The  output  codewords  are  formed  by 
letters  from  an  alphabet  of  size  d and  the  codeword  corresponding  to 
source  symbol  i has  length  m.  . Without  loss  of  generality  we  can 
assume  that  c ■ d k(d-l)  for  some  integer  k and  that  L 0- 

In  the  following  sections  we  consider  the  waiting  time  and 
delay  of  symbols  that  do  not  cause  buffer  overflows.  The  waiting 
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time  is  defined  as  the  time  difference  between  the  moment  a s..  mbol 

arrives  at  the  encoder  and  the  moment  the  corresponding  codeword 

starts  leaving  the  buffer.  The  delay  is  the  waiting  time,  plus  the 

length  of  the  codeword.  We  do  not  consider  what  to  do  when  the  buffer  is 
empty  or  overflows;  this  is  treated  in  Chapter  4. 


Minimizing  the  Average  Delav 


Unfortunately,  for  most  interemission  processes,  it  is  not 
possible  to  compute  the  average  delay.  Sometimes,  though, it  is 
feasible,  e.g.  if  the  buffer  is  infinite  and  if  A(t)  - 1 - e 
t ^ 0 . In  this  case  the  average  delay  is  equal  to  (this  is  a 
.M/G/1  queue) 


X(I  p^m^  - (I  p^m^)^  Z p^m^ 


for  all  codes  such  that  X S p.m.  < 1 . However,  even  in  this 
simple  case  we  are  unable  to  find  an  algorithm  yielding  a code 
that  minimizes  this  expression.  We  can  only  make  three  general 
observations  valid  for  all  problems. 

First,  Huffman  codes,  which  minimize  the  average  codeword, 
length,  are  robust  for  this  application.  They  are  optimal  when 
the  load  is  light,  because  then  the  waiting  time  is  negligable 
compared  to  the  average  codeword  length.  When  the  load  is  heavy, 
it  is  of  primary  importance  to  keep  the  system  stable  by  mini- 
mizing the  average  codeword  length,  i.e.  utilizing  a Huffman 


code. 
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Next,  by  a sitiple  exchange  argument,  one  sees  that  in  an  optimum 
code  ^ (because  ^ Pi+1^  ’ 

Finally,  as  in  Huffman  codes,  the  codewords  of  the  d least 
likely  symbols  have  the  same  length. 

4.  Minimizing  the  Probabilities  of  Buffer  Overflow  and  of  Long  Delays 
A.  Introduction 

{Kingman,  1970]  showed  that  for  infinite  G/G/1  queues  with 
interarrival  time  distribution  A and  service  time  distribution  B , 
the  stationary  probability  that  a symbol  waits  more  than  x 

units  of  time  is  upperbounded  by 

ir  Cx)  < e 

0 

where  s is  the  supreraum  of  the  values  of  s such  that 
A*Cs)  B*C-s)  _<  1 

Kingman's  method  yields  the  same  result  for  finite  queues. 

From  this,  it  is  easy  to  upperbound  the  probability  of  buffer 
overflow:  denoting  by  w and  b the  waiting  time  and  length  of  a code- 
word we  have 

probability  of  buffer  overflow  * PCw+b  > M) 

* PCw  > M-b) 

< ECe-^CM-b)j  0 < s < s° 
■ B*C-s)  e'®^  0 <_  s ^ s° 

By  more  complicated  arguments,  [Wyner,  1974]  established  that  there 
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-s°M 

exists  a lowerbound  decreasing  like  K e . 

Applying  these  results  to  our  model,  we  see  that  for  every 

code  C , the  probability  of  buffer  overflow  is  of  the  order  of 

e , where  s (C)  is  the  supremum  of  the  values  of  s such  that 

c sm. 

FCC.s)  :=  A*(s)  Z p.e  < 1 . Therefore  it  is  desirable  to  find  a 

" 0 

uniquely  decodable  code  with  the  largest  s . Before  doing  this,  we 

o 

will  bound  this  largest  s 


B.  Bounds  on  the  Largest  s 


This  section  can  be  considered  as  an  extension  to  asynchronous 

sources  of  results  obtained  by  [Jelinek,  1968J  and  outlined  in  Section 

c -m. 

5 . For  any  uniquely  decodable  code  E d ^ < 1 [Gallager,  1968,  p. 

i=l 

4^  ],  and  by  Holder's  inequality  for  all  s > 0 


c 

2:  Pi  e 

1=1  J 


. vln  d 
”'i\ln  d + 


^ Z d ^ 
i*l  y 


In  d 


> E p 
“ i=l 


In  d * s 


Thus  for  all  uniquely  decodable  codes. 


[In  d \ In  d + s 
c In  d + s 1 In  d 
E p. 
i»l 


with  equality  for  a given  s iff 


in  d 


In  d 


In  d + s c In  d + s 
* m^Cs)  :=  - lo^  Ip^  / E p. 

j = l - 

which  is  rarely  possible,  because  must  be  integer.  However,  for 
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every  s , there  is  a uniquely  decodab le  code  with 

1 > ^ (s) 

Thus  we  can  conclude  that  the  largest  s°  is  upperbounded  by  s^  , 
defined  as  the  supremum  of  the  values  of  s such  that 


In  d \ In  d * s 
c In  d + s 1 In  d 
A*Cs)(  Z p. 

' i-1 


< 1 


and  lowerboimded  by  the  supremum  of  the  values  of  s such  that 

In  d \ In  d + s 


e A*Cs)  [ I p. 
i»l 


In  d + s 1 In  d 


< 1 


Further,  s^  is  achievable  if  m^(s^)  is  an  integer  for  all 


Finally,  we  note  that  if  we  were  encoding  blocRs  of  n 


^ o 

symbols,  the  largest  s would  still  be  upperbounded  by  s, 

lowerbounded  by  the  supremum  of  the  values  of  s such  that 

In  d + s 


u 


el  A*(s) 


< 1 


i . 

input 

and 


This  supremum  increases  to  s^  as  n grows . 


C.  An  Algorithm  to  Construct  an  Optimal  Prefix  Condition  Code 

In  this  section  we  present  an  algorithm  to  construct  a prefix 
condition  code  with  the  largest  achievable  s°  . It  is  well  known 
iGallager,  1968,  p.  49]  that  no  gain  would  be  achieved  by  considering 
non  prefix  condition,  uniquely  decodable  codes.  The  algorithm  has  two 


L 


25 


main  steps  that  we  describe  first. 

c sm. 

Step  I finds  a prefix  condition  code  minimizing  Z p.  e 

i»l  ^ 

for  a given  s ^0  by  the  following  method.  As  [Huffman,  1952] 

noticed  a quarter  of  century  ago, /is  an  optimal  prefix  condition  code _where 

the  codewords  corresponding  to  symbols  c - d + 1 to  c are  the 

longest  and  differ  only  in  the  last  character.  If  c “ d , this 

specifies  an  optimal  code;  if  c > d , this  reduces  the  problem  to 

finding  a prefix  condition  code  of  size  c - d + 1 minimizing 
c-d  sm^  ^ c sm^ 

2 p.  e + (e^  Z pO  e ” . Again  we  can  make  the  same 

i=l  iac-d+l  ^ 

observation  and  continuing  we  will  eventually  reach  the  point  where  the 
code  is  completely  specified. 

One  sees  tHat  for  s = 0 this  algorithm  yields  a Huffman  code, 
whereas  for  s large  enough,  it  assigns  codewords  of  length  [log^c]-! 


to  the 


flogd  cl 
d - c 


most  likely  symbols,  and  codewords  of  length  flog^  to  the  others. 

By  definition  we  will  say  that  such  a code  is  generated  for  s = ® . 

Note  that,  depending  on  the  actual  implementation  of  the  algo- 
rithm, many  different  codes  may  be  generated  for  a given  s . They  all 
c sm.  / 

minimize  Z p.  e ^ but  it  may  happen  that  all  of  them  do  not  have  the 
0 

same  s 

Step  II  computes  the  s°  corresponding  to  a particular  code. 
Except  in  special  cases  this  must  be  done  by  numerical  methods,  e.g. 
the  Newton-Raphson  algorithm  [klerer  and  Kom,  1967, p.  2-59  J . There  are 
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no  special  problems  because  the  function  f(C,s)  , defined  at  the 
end  of  Section  A,  is  convex  in  s for  all  codes  C . 

The  main  part  of  the  algorithm  is  as  follows:  (see  Fig.  2.1) 

u 


1 

2 

3 

4 Loop 


compute  s 

j 1 


s.  ..m. 
1-1  3- 


use  Step  I to  find  a code  minimizing  E P-  « 

i»l  ^ 

denote  this  code  by  C. 

1 

use  Step  II  to  find  the  s°  corresponding  to 

denote  this  s°  by  s. 

J 


6 

7 

8 


if  s.  » s.  , then  stop 
^ J J 

else  j :■  j+1 

go  to  Loop 

Of  course,  we  must  show  that  this  algorithm  will  terminate  after 
a finite  time,  and  that  the  last  code  generated  is  optimum.  The  proof 
is  simple.  First  we  note  that  s.  , > s.  j > 1 because 
fCCj  , Sj)  ^ 1 (line  5),  thus  (line  4)  , s^)  ^ 1 so 

Sj^j  :■  sup  {s  : £ 1^  i • Secondly,  we  observe  that 

the  maximum  codeword  length  of  any  code  generated  by  Step  I is  less  than 
c,  so  the  number  of  codes  that  can  possibly  be  generated  by  Step  I is 
finite.  These  two  remarks  insure  that  the  algorithm  will  terminate  in 
a finite  time. 

Let  C*  and  s*  be  the  last  generated  code  amd  its  s°.  We 
must  show  that  C*  is  optimal.  If  it  is  not,  there  will  be  a prefix 
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condition  code  Ci  and  a corresponding  si  with  si  > s*  . Thus 
0 ^ ^ so  f(Ci,s*)  » 1 . Also,  by  convexity  of  fCCi.s)  , 

fCCi.sJ  < 1 . 

If  fCCi.s,)  < 1 , C*  may  not  be  the  last  code  generated  by 
the  algorithm  (lines  4,5,6). 

If  f(Ci,s^)  * 1 and  s,  > 0 , by  invoking  the  facts  that 
fCCi,0)  ■ 1 , f(Ci,si)  ^ 1 and  the  convexity  of  f(Ci,s)  we  can 
conclude  that  fCCi,s)  * 1 se  [0,si]  . By  analyticity  of  f(Ci,s) 
s > 0 (Laplace-Stieltjes  transform  of  a probability  distribution), 
f*(Ci,s)  «1  s>0,so  si»®  and  a fortiori  “ • From  the 

algorithm,  is  the  code  described  earlier  that  is  generated  by 

Step  I for  s » ® , If  for  Ci  the  waiting  time  is  0 with  probabi- 
lity one  (i.e.  » *) , it  is  clear  that  the  same  will  be  true  for 

, because  the  length  of  the  longest  codeword  in  is  no  longer 

that  the  length  of  the  longest  codeword  in  any  other  code . Thus 
* » si  » Sj  ^ s*  , a contradiction. 

If  f(Ci,s,)  = 1 and  s*  • 0 , then,  as  noted  earlier, 


is  an  Huffman  code,  and  as  such  minimizes 


codes.  The  fact  that  s*  » 0 in^jlies  that  f(C*,s) 


■T-  f (C,s)  over  all 

" ‘s*0 

' > 0 so 


s-0 


T—  f(Ci,s)  0 and  by  convexity  either  si  ■ 0 * s,  , which  is  a 

's»0 

contradiction,  or  si  > 0 , 35'fCCi,s)|  ■ 0 . As  in  the  previous 

I s»0 

paragraph  this  leads  to  the  conclusion  that  f(Ci,s)  « 1 s > 0 
and  to  a contradiction. 

We  have  exhausted  all  possibilities  and  may  conclude  that  C* 


is  optimal. 
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Before  leaving  this  section,  we  show  that  if  one  desires  to  find 


I 


I 


L 


a prefix  condition  code  minimizing 


c 

z P.-  gCm.O 
i«l  ^ ^ 


then  the  algorithm  of  Step  I can  be  used  only  if  g is  linear  or 
exponential . 

The  following  conditions  on  g must  be  met  for  the  algorithm  to 

work: 


— g is  non-decreasing, 

so  that  if  p^  > Pj  , m^  < m^  in  an  optimal  code; 

— gCm+1)  = a gCm)  + b 

so  that  at  every  step  the  size  of  the  problem  can  be  reduced 
by  1 , while  the  form  of  the  problem  does  not  change. 

These  conditions  imply  that  f must  have  one  of  the  forms 


gCm)  » a”  ♦ 3 

a ^ 1 

gCm)  » am  + 3 

a > 0 

0.  Numerical  Results 

A listing  of  a Fortran  IV  program  implementing  the  two  main  steps 
of  the  previous  algorithm  appear  in  Appendix  C.  This  program  was  used 
to  compute  the  optimal  code  for  a 128  symbol  alphabet.  The  symbol 
probabilities  are  equal  to  the  relative  frequencies  measured  in  an  air- 
line reservation  system,  and  are  listed  in  Table  2.1  . We  are 

grateful  to  Codex  Corporation  for  furnishing  these  numbers. 

We  used  two  kinds  of  interarrival  time  distributions:  determi- 
nistic and  exponential.  This  last  one  is  a realistic  model  of  what 
happens  in  practice,  see  [Fuchs  and  Jackson,  1969],  or  [Lewis  and  Hue, 


Table  2.1 
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Symbol  Probabilities  Used  in  the  Example 


1 

0.208593E  00 

44 

0. 543153E-02 

87 

0.  434284S-03 

2 

0.413809E-01 

45 

0.532954E-02 

38 

0.344279S-03 

3 

0.359989S-01 

46 

0.51S072E-02 

39 

0. 301999 E-03 

U 

0.344146E-01 

47 

0.510923E-02 

90 

0. 282097E-03 

5 

0.341741F-01 

48 

0.495C80E-C2 

91 

0.28 1404E-03 

6 

0. 310807E-01 

49 

0.  495080E-C2 

92 

0.240114E-03 

7 

0.2971C5E-01 

50 

0.431145E-02 

93 

0. 227836E-03 

a 

0. 252622E-01 

51 

C.41C917E-02 

94 

0. 125453E-03 

9 

0.250547E-01 

52 

0.  410461E-02 

95 

0. 123671E-03 

10 

0.239849E-01 

53 

0.381984E-02 

96 

0. 800050E-C4 

11 

0.2149a7E-01 

54 

0.  373736E-02 

97 

0,207934E-05 

12 

0.205013E-01 

55 

0.371647S-02 

98 

0. 3762615-05 

13 

0. 204932E-01 

56 

0.335329E-02 

99 

0. 100006E-04 

m 

0.204295E-01 

57 

0.  334189E-02 

1 00 

0. 514884E-05 

15 

0.203151E-01 

58 

0.323951E-02 

101 

0. 237639S-05 

16 

0. 195034E-01 

59 

0. 321822E-02 

102 

0.2376395-05 

17 

0. 170U39E-01 

60 

0.  289216Z-02 

103 

0.  891  145E-05 

18 

0. 141916E-01 

61 

0. 279186E-02 

1 04 

0.  376261S-05 

19 

0. 134732E-01 

62 

0.  271047E-02 

1 05 

0.257442E-04 

20 

C.126853E-01 

63 

0.  261284E-C2 

106 

0. 360418E-04 

21 

0. 126820E-01 

64 

0. 252630E-02 

107 

0. 192091S-04 

22 

0. 126658E-01 

65 

0.  219340E-02 

108 

0. 514884E-05 

23 

0. 126555E-01 

66 

0.213528E-02 

1C9 

0. 207934S-05 

24 

0. 120663E-01 

67 

0. 1 81754E-C2 

110 

0.308930E-04 

25 

0.  1 15890E-'i1 

68 

0.  171922E-02 

1 1 1 

0. 171298E-04 

26 

0. 114259E-01 

69 

0. 1 68040E-02 

112 

0.  960456E-05 

27 

0. 114 121E-01 

70 

0.  155020S-02 

113 

0. 5148845-05 

28 

C.110366E-01 

71 

0. 1437815-02 

1 14 

0.  297043E-06 

29 

0. 1C4807E-01 

72 

C. 143712E-02 

115 

0. 178229E-04 

30 

0.969496Z-02 

73 

0. 14206aE-02 

1 16 

0.236648E-04 

31 

0.957297E-02 

74 

0. 136910E-02 

117 

0. 3069505-05 

32 

0. 944445E-02 

75 

0. 13179CE-C2 

118 

0. 5544905-05 

33 

0.932216E-02 

76 

0.  123206S-02 

119 

0. 1386225-05 

34 

0. 8ai332P-02 

77 

0. 1 16750E-02 

120 

0,378241E-04 

35 

0.R44231E-02 

78 

0. 942039E-03 

121 

0.6535065-05 

36 

C.a31517E-02 

79 

0. 912136E-03 

122 

0. 3069505-05 

37 

0.826 121E-C2 

80 

C.865797E-03 

123 

0.9307515-05 

38 

0.0092  19S-02 

81 

0. 767177E-03 

124 

0. 116839E-04 

39 

0.753829B-02 

82 

0.719054E-03 

125 

0. 1960525-04 

40 

0. 737234S-C2 

83 

C.639347E-03 

126 

0.2079345-05 

41 

0.648664E-02 

84 

0.  630  138E-03 

127 

0. 1168395-04 

42 

0.645a82F-C2 

85 

0. 594690F-03 

128 

0.  4158675-05 

43 

0.602760E-02 

86 

0.  583007E-03 

I 


1972]. 
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The  results  appear  in  Fig.  2.2  and  2.3.  We  give  some  additional 
information  here; 

the  binary  entropy  of  the  alphabet  is  equal  to  5.32  ; 
the  average  codeword  length  of  a Huffman  code  is  equal  to  5.35  ; 
the  number  of  iterations  to  reach  the  optimal  code  was  generally 
small  (1  or  2)  for  Poisson  arrivals,  but  larger  (3  to  10)  for. 
deterministic  arrivals; 

the  difference  between  the  upperbound  on  s°  , and  the  performance 
of  the  optimal  code  is  extremely  small  (of  the  order  of  1%)  in  the 
Poisson  arrival  case.  This  is  the  reason  why  the  upperbound  does 
not  appear  in  Fig.  2.3. 

The  average  codeword  length  of  the  optimal  code  behaves  in  the 
expected  fashion;  being  largest  in  light  traffic,  but  close  to  the 
average  codeword  length  of  the  Huffman  code  in  heavy  traffic. 


entropy/E (a) 
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5.  Review  and  Generalization  of  Jelinek  and  Schneider's  iVork 


Jelinek  and  Schneider  considered  the  following  problem:  once 
per  time  unit  a memoryless  source  emits  a letter  from  the  alphabet 
A:  » {l,2,...,c}  . Letter  i in  this  alphabet  has  probability  p^>0 
An  encoder  maps  these  source  letters  into  codewords  formed  by  letters 
from  the  alphabet  8 :*  {1,2, ...,d}  . The  mapping  is  as  follows:  a 
complete  and  proper  set  of  M prefixes  c^  is  defined  for  the 
alphabet  A (i.e.  every  sequence  of  letters  from  A starts  with  one 
and  only  one  c^  ) . Prefix  c^  has  length  r^  and  probability  , 
induced  by  the  p^  's.  Every  c^  is  mapped  into  one  codeword  d^ 
formed  by  letters  from  8 . Codeword  d^  has  length  m^  and  the 

N -m. 

d.  's  are  uniquely  decodab le,  so  that  E d ^ < 1 (this  is  the 


3 


3-1 


Kraft  inequality  see  [Gallager,  1968,  p.  47]).  Each  time  the  prefix 
Cj  is  recognized  by  the  encoder,  codeword  d^  is  placed  in  a buffer 
of  size  B from  which  one  letter  is  removed  every  time  unit.  Jelinek 
and  Schneider  address  in  detail  the  problem  of  what  should  be  done, 
when  the  buffer  is  empty  or  overflows. 

Their  main  result  is  the  following:  for  every  block  to  vari- 
able length  code  (r^  constant),  or  variable  length  to  block  code 
(ffij  constant),  there  exists  ^ ® 

stationary  state,  for  all  B ^ 1 , 

K^  d " ^ Probability  of  buffer  overflow  < K^  d (1) 
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where  3°  is  less  than  or  equal  to  the  supremum  s^  of  the  values  of 
s such  that 


> 


c 

Z 

i»l 


1 

1+s 


1+s 


(2) 


s^  is  positive  if  the  entropy  (base  d)  of  the  source  is  less  than  one 
(this  ensures  stability)  and  is  finite  if  c > d (otherwise  there 
need  be  no  queueing  effect).  In  the  sequel,  we  always  assume  that  s^ 
is  positive  and  finite  so  that  s^  can  be  dfined  as  the  largest  root 
of  the  equation 


d 


s 


1+s 


They  give  algorithms  yielding  codes  with  e.xponent  s°  arbitra- 
rily close  to  s^  , and  conjecture  that  the  same  result  would  hold  in 
variable  length  to  variable  length  coding.  We  show  now  that  this 
conjecture  holds. 

To  show  that  the  theoretical  limit  on  the  exponent  s°  is  the 
same  for  the  variable  length  to  variable  length  codes  as  for  t.he  codes 
considered  by  Jelinek  and  Schneider,  it  is  enough  to  show  that  for 
every  code  there  exists  a > 0 such  that  for  every  B ^ 1 

-s  B 

Pr  (Buffer  overflow)  ^ d ^ 

Because  we  consider  only  the  lowerbound,  we  can  ignore  the  overhead 
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associated  with  the  recovery  procedures  that  have  to  be  used  when  the 
buffer  overflows  or  becomes  empty. 

Denote  by 

k th 

m the  length  of  the  k codeword  placed  in  the  buffer 
O = 1,2,3,...] 

k t h 

r the  length  of  the  prefix  corresponding  to  the  codeword 

k th 

n the  number  of  letters  in  the  buffer  after  the  k codeword 

has  been  placed  in  it. 
k Ic 

Note  that  m and  r are  strongly  dependent,  but  are  independent  of 
the  's  and  r-^  's  jj^k  . 

We  have  the  relation 

n'^  = Min  [B,  + Max[0,  n*^"^  - r'^]]  k = 1,2,... 

=»  Min  [B,  Max  [m'^,  n^~^  -i.  - r^]  ] 

and  we  assume  n = 0 . 

Now  defining 

w°  3 0 

w^  3 Min  [B,  Max[0,  w^“^  + m'^  - r*^]]  k * 1,2,... 

we  see  that  w obeys  the  standard  relation  for  the  waiting  time  in 

k k 

a queue  and  that  surely  n ^w  k = 0,1,2,... 

Thus  the  probability  of  an  overflow  for  the  process  n is  greater 

k 

than  or  equal  to  the  probability  of  an  overflow  for  the  process  w 

The  results  of  [Wyner,  1974]  can  be  applied  to  this  last 

process,  thus  for  every  code,  there  are  > 0 and  s°  such  that 

-s°B 

Pr  (Buffer  overflow]  ^ d”  where  s°  is  the  largest  root  of 


J 
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E q.  d 


s (ra. -r . ) 
J J 


= 1 


N s(m.-r  ) 

( E q.  d ^ ^ is  the  Laplace-Stielt jes  transform  (base  d)  of 

j = l ^ 

the  distribution  of  r^  - m’^  , n = 1,2,...) 

[Jelinek  and  Schneider,  1972]  give  a proof  of  the  following 
Lemma,  attributed  to  Forney: 


If  s^  is  defined  as  before,  then  for  all  complete  and  proper 
set  of  prefixes,  1 ^ ^u 


N 1+s  j 1+s 

E q.  d ^ = 1 

j = l ^ 


Now,  Holder's  inequality  yields 

1 s , s 

1_  , . u L_  - —li. 

N s (m.-r.)\  1+s  / N -m.  \l+s  N 1+s  1+s 

r J J 11  „ u,  ■’  L 

E q d ^ M E d J 1 i 2:  q d 

j“l  ^ / \j=l  / i=l  ^ 

N -m. 

thus  by  the  Lemma  and  the  fact  that  E d ^ 1 . 

i=i 


N s (m . -r  . ) 

E q.  d ^ J > 1 

i = l ^ 


with  equality  if  and  only  if 

N -m. 

E d ^ - 1 

j=*l 


and 


-j  / 


N s (m . -r  . ) 

Now,  the  function  E q.  d ^ ^ is  a Laplace-Stielt j ( 

j = l ^ 

of  a probability  distribution,  thus  it  is  strictly  convex  (except  in  a 


jes  transform 


trivial  case),  and  its  value  at  0 is  1.  We  have  seen  that  its  value 


is  greater  than  or  equal  to  1 at  ^ 0 • thus  it  is  greater  than  1 


for  all  s > s 

u 


0 ^ 

so  s < s 
— u 


for  all  variable  length  to  variable 


length  codes. 


Chapter  3 

Flag  Encoding  Schemes 
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1.  Introduction 


Consider  the  problem  of  finding  a binary  Huffman  code  to  jointly 
encode  a binary  random  variable,  which  is  equal  to  1 with  probability 
.15,  and  another  random  variable  which  takes  the  values  (0, 1,2, . . . ,7) 
with  equal  probability.  One  readily  finds  that  the  following  code  is 
a solution: 


(0,0) 

000 

(1.0) 

111000 

(0,1) 

001 

(1.1) 

111001 

(0,2) 

010 

(1.2) 

111010 

(0,3) 

Oil 

(1.3) 

111011 

(0,4) 

100 

(1.4) 

111100 

(0,5) 

■ 101 

(l.S) 

111101 

(0,6) 

1100 

(1,6) 

111110 

(0,7) 

1101 

(1,7) 

iniii 

This  code  has  an  interesting  structure:  all  codewords  corresponding  to 
(l,ij  start  with  111  , followed  by  the  binary  representation  of  i . 
(O.i)  is  encoded  into  the  binary  representation  of  i , except  that  a 
0 is  inserted  in  third  position  if  the  first  two  digits  are  11  . The 
same  pattern  reappears  in  the  joint  Huffman  encoding  of  a binary  random 
variable  and  a random  variable  taking  with  equal  probability  anyone  of 
2^  values. 

This  structure  offers  the  possibility  of  doing  the  coding  in  two 
steps:  first  encoding  the  messages,  then  modifying  the  codewords,  either 
by  using  a prefix,  called  a flag,  or  inserting  an  extra  symbol  to  avoid 
confusion,  to  encode  the  binary  random  variable.  The  receiver  will 
recognize  if  a flag  is  present,  possibly  recognize  and  delete  the  extra 


character,  then  decode  the  message. 

Often  in  computer  communication  networks  and  elsewhere,  one 

needs  to  jointly  encode  messages,  furnished  by  an  outside  source,  and 
binary  information  generated  locally,  like  an  "acknowledgement"  or  "end 
of  transmission."  This  can  be  done  easily  by  eventually  introducing  a 
flag,  known  to  the  receiver,  at  the  beginning  of  a message,  or  at  some 
other  point  decided  in  advance,  and  inserting  extra  symbols  to  avoid 
confusion,  if  necessary. 

This  strategy  is  attractive  for  many  reasons;  it  is  simple,  does 
not  cause  much  delay,  nor  require  much  buffering  because  the  message 
is  not  truly  reencoded  and  does  not  need  to  be  known  in  its  entirety. 

It  is  optimal  in  some  cases,  as  we  have  just  seen,  and  can  be  made 
adaptive,  as  we  shall  see  later. 

In  this  chapter,  we  will  study  this  strategy  in  detail  . We  will 

of 

first  give  a very  general  algorithm  that  permits  the  use/any  flag  at  any 
point  in  a message.  Next  we  will  study  the  performances  of  this 
strategy  and  see  how  it  can  be  optimized.  In  the  following  section 
we  examine  the  use  of  adaptive  flags  to  encode  messages  and  batch  lengths. 
Finally  we  will  see  how  reducing  the  class  of  allowable  flags  can  improve 
performances . 

Before  doing  this,  we  introduce  some  definitions.  By  flag  we 
mean  any  finite  sequence  (o^...a^)  of  symbols  from  the  alphabet 
{0,1,.., d};  V is  called  the  length  of  the  flag  (v  ^ 1)  while 

is  called  the  root  p (p  is  possibly  empty).  We  denote 
by  S the  symbol,  different  from  ct^  , that  is  inserted  when  necessary 


to  avoid  ambiguities,  and  we  will  call  the  sequence  ' ^2  ' ' 

^ , 0]  the  antiflag. 

Fixed-length  flags  are  actually  used  in  IBM  Synchronous  Data  Link 
Control  to  encode  message  lengths  [Donnan  and  Kersey,  1974].  They  are 
analyzed  in  [Camrass  and  Gallager,  1976].  [Schalkwijk  and  Post,  1973] 
used  flags  to  encode  data  for  transmission  on  the  Binary  Symmetric 
Channel  with  Noiseless  Feedback. 
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2.  General  Flag  Coding  Algorithm 


We  consider  the  following  situation;  a semi-infinite  sequence 

(.the  data)  (u^,  u^  , ...)  of  d-ary  symbols  is  furnished  to  an  encoder, 

1 2 

together  with  a sequence  (v  , v ,...)  of  binary  symbols.  We  give 
an  algorithm  to  jointly  encode  these  two  streams  using  flags,  i.e.  the 
output  (x^  , ,...)  will  consist  of  the  sequence  C--u^--)  plus 

some  flags  or  inserted  symbols  used  to  indicate  the  values  of  the 
v^  's. 

We  denote  by  ) the  flag  to  be  used  after  u^  if 

t t ^ t 

V ■ 1 , by  p the  root  of  this  flag,  and  by  3 the  symbol  that 

is  to  be  inserted  in  case  of  possible  confusion.  We  place  no 

restriction  on  the  composition  of  the  flags,  except  that  of  course 

s'  ^ . 

Before  giving  the  algorithms  for  coding  and  decoding  we  note 
that  they  need  the  following  features  to  be  efficient: 

a)  we  want  to  either  use  the  flag  corresponding  to  a v^  , or 
to  make  at  most  one  insertion; 

b)  if  d > 2 we  want  to  make  an  insertion  only  when  it  is 
necessary,  i.e.  when  the  next  symbol  is  the  same  as  the 
last  flag  symbol  or  the  insertion. 

We  will  illustrate  these  two  points.  Throughout  examples  1 


to  3 we  use  d«3  and 


1111 


V,  . 4 (a,  a;  ai  a")  » (0  , 0 , 0 , 0)  6 « 2 


Vj  ■ 2 (a^*  a^) 


1 2 3 4 


(0  , 0) 


2 


Example  1:  Violation  of  requirement  a) 


u ■ 1 u » 1 

- 1 . 1 


^1  ^2  3 4 5 6 7 8 9 10 

xxxxxxxxxx 


100200100 
.1  1 1 o2  1 1 2 2 2 

u CLj  02  S 03  u 02 


There  we  insert  8 in  the  middle  of  the  first  flag  to  indicate  that 
we  are  not  transmitting  the  second  flag.  We  transmit  the  second  flag 
in  Xg  and  Xg  . We  have  thus  used  both  the  flag  and  the  insertion. 
The  correct  way  of  proceeding  is  illustrated  below. 


Examnle  2: 


123456780 
xxxxxxxx  x 


1 0 0 0 0 0 0 1 

“1  ®2  “2  “2  “3  “4 


We  realize  that  if  x^  is  0 , X3  and  will  be  interpreted  as 
the  second  flag.  We  then  repeat  o^  in  Xg  and  continue  the  trans- 
mission of  the  first  flag,  which  will  be  decoded  after  the  second. 


If  we  had  to  transmit  u*  • 1 


u2-  1 


the  output  would  be 


1 
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Example  3: 


12  3 4 

X X X X 


110  0 

12  2 2 
u u 02 


We  see  that  here  the  second  flag  appears  after  U2  . To  insure  that 
the  encoder  does  not  repeat  the  second  flag  after  U2  in  example  2 
we  introduce  in  the  algorithm  below  the  indicator  variable  w^  which 
is  initially  set  to  1 , then  to  0 as  soon  as  an  insertion  or  a flag 
corresponding  to  are  transmitted.  Once  w^  = 0 no  more  flag 

or  insertion  corresponding  to  can  be  sent. 

Let  us  look  now  at  the  peculiarities  introduced  by  requirement 
b) . Here  we  use  d=3  and 


vi  . 3 

V2  - 2 


CaJ  02  Oj)  » (0  , 0 , 2) 


r 2 2, 


CO  . 0) 


8^  « 2 


1 2 3 4 5 

X X X X X 


10  0 1 

12  3 4 

u u u u 


No  insertion  is  needed,  neither  for  v , nor  for  v . 
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u 

V 

X 


1 

1 

1 


* 1 

» 0 


3 


u 


6 


X 


0 


1 0 0 


2 

4 

u 


4 


u 


2 


4 

One  sees  that  the  change  of  value  of  u from  1 to  2 provokes  the 

appearance  of  two  insertions.  The  point  is  that  the  decision  to 
2 

insert  3 depends  on  the  value  of  the  next  symbol,  which  itself 
depends  on  the  value  of  the  next  symbol! 

The  algorithm  given  below  solves  this  problem  by  establishing 
a first  in  first  out  stack  of  row  vectors  s “(s^, 52*52)  • .Normal 
flag  or  data  characters  occupy  only  the  first  element  of  the  vector  An 
inserted  character  associated  with  v^  is  represented  by  the 
triple  ) . 

In  the  previous  two  examples,  the  stack  would  be 


s(l)  - (?.3^ci^)  » C?,2,0) 

s(2)  - - C?.0,2) 

5(3)  - (u^,-,-)  > (1,-,-)  Example  4 

■ (2,-,-)  Example  5 

normal 

As  soon  as  a / character  enters  the  stack,  the  subroutine 
"cleanstack"  is  called.  Starting  from  the  end  it  compares  3(j) 
with  s(j-l)  . If  s^(j-l)  « ? and  (s^(j)  • S2(j-1)  or  s^Cj-l))  , 
s^(j-l)  is  replaced  by  S2(j-1)  ; if  Sj^(j-l)  » ? but 


46 


SjCj)  ^ (S2(j-l)  and 

53(1-1)) 

, s(j-l)  is  deleted  and  the 

collapsed. 

Thus  in  Example 

4 the  following  transformation  occurs 

(?.2,0) 

(?.2.0) 

Cl,-,-) 

(?.0,2)  ^ 

Cl,-,-) 

(1,-,-). 

whereas  in  Example  5 

(?,2,0) 

(?,2,0) 

C2,-,-) 

C?,0.2)  - 

(0,-,-). 

- CO,-.-) 

(2,-,-) 

(2,-,-) 

C2.-.-) 

The  stack  is  then  emptied  to  yield  part  of  the  output  sequence. 

Before  giving  the  algorithms  we  make  precise  2 syntactic  points 
-)  means  the  empty  set  if  i > j 

-)  In  a "do  loop"  of  the  form  "For  i :=  a step  b until  c do.." 
no  statement  is  executed  if  (sign  b)  a > (sign  b)  c . 

Most  of  the  notation  has  been  explained  above  or  is  self 
evident,  except  (Q  ) . It  represents  the  output  of  the 

decoder.  It  is  mimicked  by  the  encoder.  At  every  instant  before 
t"  -♦  t"  1 , these  sequences  are  equal  in  both  encoder  and  receiver. 
This,  together  with  the  fact  that  Q^,...,Q^  is  equal  to 

u^, — ,u^  guarantees  unique  decodability  of  the  (u^)  sequence. 

Unique  decodability  of  the  (v^)  sequence  is  guaranteed  because  the 
flag  to  be  used  after  u^  appears  if  and  only  if  v^  » 1 . 
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Coding  Algorithm 

cl 

Set 

the  binary  variables  w’’  , i > 

0 , to  1 

0,0 
V and  w 

to  0 

c2 

Set 

the  integer  variables  t , t'  , 

t",  stacksize  to 

c3 

For 

j 0 Step  1 until  t'  - 1 

do 

c4 

begin 

cS 

if  =1 

and  ^ a 1 

c6 

then 

c7 

begin 

c8 

0 

c9 

stacksize  :=  stacksize 

+ 1 

clO 

if  v^  = 0 

cll 

then  s (stacksize) 

cl2 

else 

cl3 

begin 

cl4 

Sj^ (stacksize)  :■ 

t'-j 

a a.  , 

CIS 

t'  t'  - j 

cl6 

cleanstack 

cl7 

end 

cl8 

end 

cl9 

else  continue 

c20 

end 

I 


c21 

t' 

:=  t'  + 1 

c22 

if 

V = 1 and  w a 1 

c23 

then  Q “‘t'  t 

c24’ 

else 

c2S 

begin 

c26 

t :=  t + 1 

c27 

*t'  t 

u :=  u 

c28 

end 

c29 

stacksize  ;=  stacksize  + 1 

c30 

''t  * 

Cstacksize)  = u 

c31 

cleanstack 

c32 

go 

to  c3 

k9 

Clean  Stack 

T 


csl 

For 

i :=  stacksize  Step  -1  until  2 do 

cs2 

begin 

cs3 

if  s^(i-l)  = ? 

cs4 

then 

cs5 

begin 

cs6 

if  s^Ci)  = s^Ci-l)  or  s^Ci)  =* 

cs7 

then  s^Ci-1)  :=  S2Ci-l) 

cs8 

else 

cs9 

begin 

csIO 

stacksize  :=  stacksize  - 1 

call 

for  j :=  i-l  Step  1 until  stacksize  do 

s(j)  =.  sCj  + 1) 

csl3 

end 

csl4 

end 

cslS 

else  continue 

csl6 

end 

csl7 

For 

i 1 until  stacksize  do  x » Sj^(i) 

csl3 

t"  : 

a t"  + stacksize 

csl9 

stacksize  :»  0 

Decoding  Algorithm 


dl 

Set 

the 

1 a2 

binary  variables  v , v ....  to 

. to 

d2 

Set 

the 

integer  variables  t'  , t"  to  0 

d3 

t"  : 

:=  f 

' + 1 

d4 

For 

j : 

:=  0 

Step  1 until  t'-l  do 

dS 

begin 

d6 

if 

Cu^' 

-j  +1  , t ' , t ' - i , 

■'  , ,u  ) = p ^ and  1 

d7 

then 

d8 

begin 

d9 

:=  0 

dlO 

•r  t"  t'-j 

If  X 

dll 

then 

dl2 

begin 

dl3 

*t'-j  1 

V ■'  :=  1 

dl4 

X 

t'  :=  t'  - j 

dlS 

go  to  d3 

dl6 

end 

dl7 

else 

dl3 

begin 

dl9 

..  t"  t'-j 

if  X » 3 

d20 

then  t"  t"  + 1 

d21 

else  continue 

d22 

end 

d23 

end 

J 
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d24  else  continue 

d2S  end 

d26  t'  :=  t'  + 1 

.t'  t" 

d27  u :=  x' 

d28  go  to  d3 

A program  implementing  these  algorithms  has  been  written  in 
Basic.  Data  and  flag  compositions  were  randomly  chosen  in  a ternary 
alphabet,  for  t=l  to  100  . The  output  of  the  coding  program  was 
fed  into  the  decoding  program  which  decoded  it  correctly. 

As  final  remark,  we  note  that  there  is  no  reason  for  all 
flags  to  be  known  in  advance.  All  that  is  needed  is  that  if  the  flag 
corresponding  to  has  length  , the  flag  corresponding  to  , 

must  either  be  known  at  time  t , or  it  must  be  known  that  its  length* 
is  greater  than  v^-i  , this  for  i=l , 2 , . . . , v^-1  . This  guarantees 
that  the  transmission  of  the  flag  corresponding  to  v'"  will  not  be 
interrupted  because  of  the  flag  used  to  signal  . 
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A.  Method 

We  will  investigate  in  this  section  the  performances  of  the  pre- 
vious algorithm,  and  see  how  they  can  be  optimized;  more  precisely,  we 
will  study  how  to  minimize  the  total  average  number  of  symbols  used  (flag 
and  inserted  characters)  because  of  the  possible  presence  of  a flag  at 
time  t . We  denote  by  v the  length  of  this  flag,  and  by  p the 
probability  that  it  will  be  used. 

We  have  immediately  that  the  average  number  of  symbols  used  is 
equal  to  pv  + (l-p)  Pr  (insertion  is  needed) . 

We  note  that  v ^ 1 whereas  Pr  (insertion  is  needed)  < 1 , so 
that  a flag  should  never  be  used  to  indicate  an  event  of  probability 
greater  than  .5  ; rather  a flag  should  be  used  to  indicate  the 
complement  of  this  event.  From  now  on  we  will  assume  p < .5  . 

In  general,  Pr  (insertion  is  needed)  is  a complicated  function 
of  the  data  process  statistics,  of  the  flag  composition  and  of  the 
compositions  and  probabilities  of  insertion  of  the  neighboring  flags. 

To  avoid  this  difficulty  we  use  a trick  dear  to  information  theorists, 
i.e.  we  will  average  Pr  (insertion  is  needed)  over  the  ensemble  of  flag 
compositions  and  insertion  symbols.  If  the  flag  is  not  used  after  time 
t , an  insertion  due  to  this  flag  will  occur  if  the  symbols  (x  , 

X , X ) are  equal  to  the  flag  or  the  antiflag.  If  their 

compositions  are  chosen  randomly,  the  probability  of  an  insertion  is 
2d'*.  We  will  therefore  minimize  on  v the  function  f(p,v)  defined 
by  f(p,.i)  :■  pv  ♦ (1-p)  2d"'*  . We  will  denote  by  v°(p)  a value  of 
V that  minimizes  f(p,v)  . 

We  stress  that  the  value  of  f(p,v)  is  an  ensemble  average  over 
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the  composition  of  the  flag,  and  that  there  is  no  guarantee  that  a 
particular  flag  will  perform  as  well.  However,  we  are  sure  that  for 
every  u and  v processes  there  will  be  at  least  a flag  composition  that 
will  achieve  this  or  a better  result.  Consequently,  we  do  not  claim 
that  v°(p)  is  the  length  of  the  flag  which  causes  the  use  of  the  mini- 
mum average  number  of  symbols,  but  only  that  there  is  a flag  of  length 
v°(p)  which  will  use  no  more  than  an  average  of  f(p,v°Cp))  symbols 
for  each  given  u and  v process. 

B.  Optimization  and  Performance  Analysis 

If  we  allow  V to  take  real  values,  one  checks  that  for  p 
fixed  fCp.v)  is  convex  in  v , and  takes  its  minimum  value 
P(logd  ♦ log^  (.2  log^  d)  ♦ log^  e)  at  v =>  log^  d) . 

Of  course,  v°Cp)  must  be  integer,  and  by  convexity  of  fCp.v)  one  sees 
that  it  must  be  equal  to  fv' (p)"]  or  Lv'Cp)‘*’lJ  where  v' Cp)  is  such  that 
f CP.'J'  Cp))  = fCp.  v'  Cp)  + 1) 

This  equation  yields 

v’(p)  - logj  ^ > log^ 

v°Cp)  - [iog^  > log^ 

or  [log^  + log^  2(d-l)J 

Moreover,  for  every  p the  value  of  fCp.v°(p))  (which  is  a piecewise 
linear  function  of  p (see  fig,3.]^)  will  be  lowerbounded  by  the  minimum 
value  on  v of  f(p,\;)  and  upperbounded  by  f(p.v'(p))  thus 


PClogd  ^ 


logj  (2  logg  d)  + log^  e)  <_f(p,v°(p)) 
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1 P (logd  ^ ^ 

Specializing  these  results  to  the  case  d=2  , we  see  that  v°(p) 
■ riogj  or  Llog2  ■*■  iJ  (figure  32)  or  equivalently  v°Cp) 

is  such  that 


2^^  (P)  > 1 


^:p  < 


2 " fPi-l  . 1 


and  the  value  of  f(p,v°(p))  is  lowerbounded  by  pClog^  + 1.91393) 
and  upperbounded  by  pClog^  ^ * 2)  . 


It  is  interesting  to  compare  the  average  number  of  bits  (counting 
a symbol  as  log^  d bits)  used  by  this  scheme  to  the  binary  entropy 
W(p):  » -p  log2  p - (1-p)  log2  (1-p)  for  the  following  reason:  in 
general  H(p)  is  not  a lowerbound  to  the  average  number  of  bits  used 
when  a particular  flag  is  utilized,  because  we  are  jointly  encoding  the 
data  and  the  fact  that  an  event  occurs.  However  if  the  entropy  of  the 
data  is  log2  d bits  per  symbol,  and  if 

the  only  event  to  be  signalled  is  the  one  we  are  considering  at 
time  t , H(p)  is  a lowerbound  to  the  average  number  of  bits  used  by 
any  scheme  to  indicate  the  possible  occurrence  of  the  event.  Because 
f(Pf'^)  does  not  depend  on  any  hypothesis  about  the  data  on  the  other 
flags,  H(p)  is  a lowerbound  to  (log2  d)  f(p,v)  . 

From  this  remark  and  the  bounds  developed  earlier,  one  finds 


immediately: 


Max(0,  p(log2(2  log^  d)  ♦ (log2  e))  + log2  (1-p)) 

< (log2  d)  f(p,  v®(p))  . H(p) 


1 p(log2  (2  + (log^  log2 

1P(1oK2  c|^)  " (log2 

The  last  inequality  uses  the  fact  that  log^  (1-p)  5_  -p  log2  e . 

In  particular,  for  d»2  we  obtain 

Max(0,  p(l. 91393)  ♦ log2  (1-p))  1 fCp.  v°(p))  - ^^Cp) 

^ 2p  -f  log^  (l-p) 

^ .55730  p 

For  small  p , for  which  log^  (1-p)  * -p  log2  e , 

.47123  p f(p,  v°(p))  - H(p)  ^ .55730  p , (1) 

As  p goes  to  0 , ~ oscillates  between  .47123  and 

.55730  . These  facts  will  be  used  later. 

For  d»2  , then,  flag  schemes  are  quite  efficient,  but  they  dete- 
riorate as  d increases:  the  lowerbound  on  f(p,\»)  - H(p)  increases 
like  log2  (logg  d)  while  the  upperbound  increases  like  log^  d . 

C.  Sensitivity  Analysis 

We  will  investigate  here  the  sensitivity  of  the  performance  of  the 
flag  schemes.  Two  issues  are  at  hand:  First,  how  does  a wrong  choice 
of  V degrade  f(p,v)  for  a given  p ? Second,  if  p is  imperfectly 
known,  how  does  an  error  in  the  estimate  of  p affect  the  choice  of  the 
flag  length?  We  will  treat  these  problems  for  d^2  only. 

The  first  point  is  easy  to  treat.  If  one  uses  a flag  of  length 
v°(p)  + k in  place  of  v°(p)  the  penalty  is  equal  to  f(p,  v°(p)  + k) 

- f(p,  v°(p)) 

- (k  . - — i ^)  : 

2'^  Cp)*k  ^ ^ 


k > 0 
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^2-(v°(p).k-l)  . ^ p3.,<0 


.0  /•_  1 . t.  I ^ 1 -n  1 t 1 

- k)l 

2 

These  are  saw-toothed  functions  of  p , and  are  plotted  in  figure  3.3 
for  k=l  and  k=-l  . These  expressions  are  exact  but  do  not  give  much 
insight,  so  we  will  derive  simple  upperbounds.  We  recall  that  fCp,v) 
is  a convex  function  of  v , and  that  V (p)  f_v°Cp)  £v'(p)  + 1 . 
Thus,  by  convexity,  for  k > 0 


f Cp,  V (p)  + k)  < 


k+v'Cp)+l-v  (p) 


f Cp.V  (p)  + l+k) 


and 


f Cp.  V’  (pM)  1 f (p,v'Cp)  + l* 


k) 


k+v'Cp)+l-v  (p) 


Ck+v'(p)*l-u°  Cp) 


fCp.v  Cp)) 


Adding  these  inequalities,  one  obtains 


fCPfV  Cp)  + k)  + fCp.v'Cp)  + 1)  ^fCp.v'Cp)  + 1 + k) 
fCp,v°Cp)) 


or 


f(p.v°Cp)  ♦ k)  - fCp,v°Cp))  ^fCp.v'Cp)  ♦ 1 + k) 
-fCp.v’  Cp)  + 1) 

Computing  the  right  hand  side  jnember,  one  gets 

f(p.v°Cp)  ♦ k)  - fCp,v°Cp))  < Ck  * 2'''  -1)  p 


Similarly,  for  k < 0 , one  has 


f(p.v°Cp)*k)  < — fCp,v'Ck)*k) 

V Cp)-v'Cp)-k 


fCp.v'^Cp)) 


.fkVkl  . 


v"Cp)-v'Cp)-k 
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) 

fCp.v'(p))^  (p)  f (p,v*  (p)-t-k)  +— — 

V (p)-v'Cp)-k  V (p)-v'(p)-k 

f(p,v°  Cp)) 

Adding  these  inequalities , one  obtains 

£(p.v°Cp)>k)  * fCp.v'Cp))  < f(p,v'Ck)+k)  + fCp.v°(p)) 

and  thus 

f(p.v°Cp)*k)  - f(p,v°(p))  < (k  2)  p 

These  upperbounds  are  plotted  in  figure  3.3  for  k=l  and  k=-l  . The 
penalty  is  always  less  then  .Sp  if  one  uses  flag  length  too  large 
by  one  symbol,  whereas  it  is  less  than  p if  the  length  is  too  small 
by  one  symbol.  The  same  pattern  appears  for  larger  [kj  , the  penalty 

.V 

increasing  roughly  like  kp  for  k > 0 , but  like  2 p for  k < 0 . 

It  will  be  important  later  to  have  an  upporbound  on  f(p,2)  - H(p) 
for  p between  1/3  and  1/2  , i.e.  in  the  region  where  v°(p)  * \j 
because  flags  of  length  1 have  some  awkward  properties,  and  we  will 
wish  to  use  flags  of  length  2 instead.  We  want  an  upperbound  of  the 
form  ap  ^ f(p,2)  - H(p)  . Because  this  function  is  convex,  the  tightest 
upperbound  of  this  form  will  equal  it  at  p = 1/3  or  p = 1/2  , so 
a - Max  (3  (f(l/3,2)  - H(l/3)),  2 Cf(l/2,2)  - H(l/2)) 

- .5  ( 2 ) 

The  second  point,  the  sensitivity  of  the  optimal  length  to  an 
error  in  the  estimate  of  p is  more  difficult  to  assess,  due  to  the 
discontinuities  in  v°(p)  (figure  3.2)-  A good  rule  of  thumb  is 
that  when  p is  overestimated  or  underestimated  by  about  a factor  of  2, 
the  resulting  flag  length  is  too  small  or  too  large  by  one  symbol. 


Figure  3.3;  Penalty  for  not  Using  the  Optimal  Flag  Length 
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4.  Adaptive  Flag  Stategies  to  Encode  Batch  and  Message  Lengths 

We  consider  the  following  problem:  a batch  of  messages  must  be 
transmitted  on  a noiseless  binary  link.  We  denote  by  m the  random 
number  of  messages  in  a batch,  and  by  bj^,b2,...  the  lengths  (number 
of  bits)  of  these  messages.  Being  motivated  by  the  case  whae  a batch 
would  be  the  set  of  messages  in  a busy  period  of  a G/G/1  queue,  we 
model  the  b^  's  as  independent  identically  distributed  random 
variables,  but  we  let  the  probability  of  having  m messages  in  a 
batch  depend  on  the  lengths  of  these  messages  as  follows. 

Let  Cn,S,P)  be  a probability  space. 

bj^,b2,...  be  a sequence  of  measurable  functions  b^^  :Q-*' 

u,  r ==  {1,2,...}) 

m be  a measurable  function  m:Q*IN 

be  the  smallest  a - algebra  making  b^ 
measurable 

We  require  the  b^  's  and  m to  have  the  following  properties: 

the  b^  's  are  independent  and  have  a probability 
mass  function  B 

bj^  and  m have  finite  means 

In  words,  the  second  property  says  that  the  knowledge  of  b^ 
does  not  give  any  information  as  to  whether  or  not  m is  smaller 

than  i . 

Our  problem  is  that  not  only  must  we  transmit  the  messages,  but 


we  must  also  indicate  the  number  of  messages  in  the  batch  and  their 
lengths.  We  assume  the  starting  time  of  the  transmission  to  be  known 
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to  the  receiver.  We  will  examine  different  schemes  to  furnish  this 
information,  and  we  will  evaluate  their  performances.  Before  doing  this, 
we  characterize  precisely  what  we  mean,  and  compute  the  entropy  of  the 
information  we  are  sending. 

We  want  to  specify  to  the  receiver  which  event  from  the  countable 
set  A of  disjoint  events.  A:  m(ui)  = k,  , . . . ,bj^ (ui)  = 

x^};  k,  x^ , . . . ,x^e  IN**}  , occurred.  Note  that  UA  * fl  . To  obtain  a 
simple  expression  when  computing  the  entropy  of  A , it  is  handy  to 
define  the  functions  kdN**  , by  R^:  JN**}^  ->•  |R  . 


m(co)=k,  b^  C(i))=x^, . . .b^Cti))=x^}) 

RkCxi,..,x^}  ^ 


n B^Cx.) 

i=l 


if  n B''‘(x.}  > 0 
i-1  ^ 


a3 


otherwise 


In  words,  , . . . ,bj^)  is  the  conditional  probability  that  the  batch 

contains  k messages,  given  the  lengths  of  the  first  k messages. 

We  often  denote  Rj^  Ch^  , . . . ,bj^)  by  Rj^Cy  • 

It  is  now  easy  to  write  the  entropy  of  A as 

m((jj) 

H(A)  = E(-log2(R^^^^Cb(u))  B'"Cb.(ui)))) 

m(cLj) 

. E(.  log2  B'"(b.(u)))  - log^  Rn,(^)Cb(a))}) 
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= ECm)  H(B)  - EClog^  (b (<^) ) ) 

by  the  theorem  proved  in  Appendix  A,  which  holds  because  of  the 
conditions  imposed  earlier  on  the  b^  's  and  m . HCB)  denotes 

oo 

- Z B’^Ci)  log,  B^'ci) 
i=l  ^ 

This  can  be  rewritten 

oo 

H(A)  = E(m)  H(B)  - E Z R.  Cb(a)))  log,  R.  (bCoj)) 

i=l  ^ ^ ^ 

and  can  be  put  under  the  form 

H(A)  = ECm)  H(B)  ^ E Z R^  (;b(ai))H-i (3) 

i=l  Pi.lCb(ai)) 

with  R*"  :=  1 
o 

i 

R'i^  :=  1 - Z R.  i > 1 

^ j = l ^ 

This  form  will  be  useful  later. 

We  will  refer  to  the  second  term  in  (3)  as  the  conditional 
entropy  of  the  number  of  messages  given  their  lengths.  It  is  smaller 
than  the  entropy  of  the  number  of  messages  which  itself  is  bounded  by 
E(m)  HCl/E(m)),  [Gal lager,  1968,  pp.  25  and  507].  This  upperbound 
is  achieved  if  m is  geometrically  distributed  and  independent  of  the 
message  lengths.  Because  E(m)  H(l/ECm))  is  approximately  equal  to 
log^CeECm))  the  second  term  in  (3)  is  generally  smaller  than  the 
first. 

We  go  on  to  the  analysis  of  some  coding  schemes  to  transmit  the 
information  in  A . From  the  point  of  view  of  minimizing  the  expected 
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codeword  length,  the  optimum  would  be  to  jointly  encode  the  number  and 
lengths  of  the  messages.  This  method  uses  at  most  one  more  bit  than 
the  theoretical  minimum,  but  is  generally  infeasible,  amd  can  lead  to 
large  delays  because  all  messages  must  be  known  to  the  transmitter 
before  the  appropriate  codeword  can  be  found. 

It  would  be  easier  to  encode  separately  the  number  of  messages 
and  their  lengths,  in  such  a way  that  a message  could  be  transmitted 
and  decoded  correctly  before  all  messages  in  the  batch  have  been  pro- 
cessed by  the  transmitter.  We  will  examine  three  strategies  in  this 
class,  using  flags. 

The  first  two  strategies  have  in  common  that  they  transmit 

I sequentially  each  message  together  with  a codeword  indicating  its 

length.  If  the  codewords  are  well  chosen,  this  will  require  an  average 
number  of  bits  between  E(m)  H(B)  and  E(mHH(B)+lj  . 

To  indicate  the  end  of  a batch,  the  first  strategy  transmits  a 
flag  of  length  v after  the  last  message,  and  makes  appropriate 
insertions  in  the  other  messages.  By  the  usual  random  coding 
argument,  this  will  use  an  average  of  v+ (E(mj -1) 2* E (m) 

(1  - g ) 2 bits,  so  that,  as  we  have  seen  earlier,  the  optimum 

V ■ u°(l/ECm))  if  Ef.m)  ^ 2 . If  E(mj  < 2 , the  flag  should  be  used 

after  a message  if  it  is  not  the  last  in  the  batch.  We  do  not  consider 

this  case  any  further.  From  previous  studies  this  choice  of  flag 
length  will  use  at  most  an  average  of  log^CECm)-!)  + 2 bits,  which 

lies  between  E(m)  H(l/E(m))  and  E(m)  H(l/E(m))+  .55730.  Thus  this 

strategy  is  efficient  if  the  conditional  entropy  of  the  number  of 

I 

i 


A 
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messages  given  their  lengths  is  close  to  its  maximum. 

The  second  strategy,  using  variable  flag  lengths,  is  efficient 
under  all  circumstances.  The  idea  is  that  at  the  end  of  the  trans- 
mission of  the  i^^  message,  both  transmitter  and  receiver  know 
bj^  ,b2 , . . . ,b^  , and  can  compute  Cy  and  . The  cost  of'  using 

a flag  of  length  v to  indicate  that  message  i is  the  last  one  in 
the  batch,  given  there  are  more  than  i-1  messages  in  the  batch,  is 


RiCy 


V + 


R-(b) 


2 . 2 


Thus 


'i®!  . 


'lx® 


if 


V should  be  a function  of 


*^i  '-y  £ ^ 1 ^y  » 


strategy  should  be  changed  as  indicated  earlier  if  R^  Cy  > ^ R^  Ct)  • 
Given  b this  scheme  uses  less  than 


RiCy 


l^Licyi 


RiCb) 


.55730 


bits 


We  will  incur  this  cost  if  the  number  of  messages  in  the  batch  is 
greater  than  i-1  , so  the  average  total  number  of  bits  used  is  less 
than 

00 

E I (R..^cy  H 
i»l 


(RiCy 


+ R.Cy  .55730) 


* E Z 
i-1 


Ri_l(b)  H 


RiCb) 


+ .55730 


(fay  comparison  with  ( 3 )) 

which  As  very  efficient.  Note  that  if  R^Cy  / R?  j^Cy  <<  1 for  all 
we  have  from  formula  ( 1 ) that  the  average  number  of  bits  used  is 
larger  than 


i 


j 


00 
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E I 
i»l 


RiCb) 

i-uyj 


+ .47123  , approximately. 


The  only  problem  with  this  strategy  is  that  in  general  it  does 
not  meet  the  requirement  of  the  general  flag  coding  algorithm  of 
Section  2 that  if  a flag  of  length  v may  be  inserted  at  time  t , 
flags  starting  at  time  t+i  must  either  be  of  length  greater  than 
v-i  , or  be  known  to  both  transmitter  and  receiver  at  time  t . There 
are  two  remedies  to  this:  one  is  to  assiime  that  the  message  lengths 
are  larger  than  the  longest  flag,  which  often  makes  sense;  the  other 
is  to  use  a special  class  of  flags  developped  in  the  next  section. 

They  do  not  have  this  requirement,  but  two  new  problems  arise  then. 

^ • The  averaging  on  the  flag  composition  to  get  f(p,v)  does  not  work  any- 

more, and  this  special  class  does  not  contain  flags  of  length  one. 

These  difficulties  can  be  overcome:  on  one  hand,  if  for  all  j all 
messages  of  length  j are  equally  likely,  f(p,v)  will  still  be  an 
upperbound  on  the  average  number  of  bits  used  by  a flag  of  length  v 
from  this  class;  on  the  other  hand  we  have  shown  in  (2  ) that  the 
upperbound  f(p,v(p3)  - H(p)  £p  ,55730  still  holds  if  one  uses  flags 
of  length  2 instead  of  flags  of  unit  length,  thus  the  penalty  for  not 
^ using  the  optimal  length  is  not  unbearable. 

To  conclude  the  analysis  of  this  variable  flag  length  algorithm, 
we  note  that  it  can  also  be  used  to  encode  the  length  of  a message.  It 
is  sufficient  to  replace  the  word  message  by  the  word  symbol  in  the 
previous  description,  and  to  use  flags  from  the  special  class  mentioned 
above.  If  for  all  j all  messages  of  length  j are  equally  likely, 
the  conclusion  that  the  average  number  of  bits  used  will  be  less  than 

i 

f 

1 


the  entropy  of  the  message  length  .55730  still  holds. 

The  third  strategy  works  only  in  the  case  where  the  messages 
have  a variable  length.  It  is  based  on  the  observation  by  [Gal lager, 
1978]  that  any  Huffman  code  can  be  modified  so  that  a 2 symbol  prefix, 
say  00  , is  not  used,  and  so  that  the  resulting  redundancy  is  between 
.41503  and  1.  The  strategy  is  as  follows:  transmit  sequentially  each 
message  together  with  a modified  Huffman  codeword  indicating  its  length. 
After  the  last  message  in  the  batch,  send  00.  The  number  of  bits  used 
by  this  strategy  lies  between  ECm)  (HCB)  + ,41503)+  2 and  ECm)  (H(B) 

+ 1)  + 2 . This  strategy  is  indeed  a flag  strategy,  so  it  must  be  less 
efficient  then  the  previous  optimal  algorithm,  but  it  is  extremely  easy 
to  implement. 
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5 . Desirable  Flag  Conipositions 

As  we  have  noted  earlier,  the  algorithm  given  in  Section  2 suffers 
from  the  fact  that  insertions  and  flags  may  appear  in  the  middle  of 
other  flags,  and  consequently  that  the  v'’  's  are  not  necessarily 
received  in  order,  and  that  the  flag  corresponding  to  v^  may  have  to 
be  specified  before  time  t , This  complicates  the  algorithm  and  removes 
some  freedom  in  using  adaptive  flags. 

The  problem  of  flags  appearing  in  flags  could  be  solved  at  the 
expense  of  making  more  insertions,  but  this  can  lead  to  more  than  one 
insertion  per  possible  flag  use  and  the  analysis  of  Section  3 breaks 
down.  We  will  not  pursue  this  approach. 

Instead  we  look  at  this  in  the  context  of  Sections  3 and  4,  where 
the  important  parameter  from  the  user's  point  of  view  is  the  flag  length, 
not  the  flag  composition.  We  assinne  that  we  have  a class  of  flags 
containing  at  most  one  flag  of  each  length  and  we  use  only  flags  from 
this  class  in  the  following  algorithms.  The  main  difference  between 
these  algorithms  and  those  of  Section  2 is  that  flags  are  inserted  at 
once  (c'19,  c'20,  c'21)  whereas  in  Section  2 a check  was  made  between 
flag  symbols  to  see  if  insertions  were  needed.  Tl.us  here  no  flags  or 
insertions  will  appear  in  flags.  Of  course  these  algorithms  will  not 
work  with  all  classes;  we  say  that  a class  is  allowable  if  the  composi- 
tions of  the  flags  in  the  class  are  such  that  the  decoding  algorithm 
yields  the  correct  output  for  all  associations  of  flags  in  the  class 
with  v^  . 


I 


1 


Coding  Algorithm 


c'l 

Set 

the  integer  variables  t and  t" 

to  0 . 

c'2 

Set 

the  integer  variable  i to  -1 

c'3 

For 

j 0 Step  1 until  i do 

c ' 4 

begin 

c’S 

■r  f t-j+1  t^  t-j  , 

if  Cu  , . . . ,u  ) ■ p ^ and 

t+1 

U 3 

or 

a"-:* 

"t- 

c'6 

then 

c'7 

begin 

c'8 

t"  :»  t"  + 1 

c'9 

t"  .t-j 

c'lO 

i :*  j - 1 

c'll 

e''d 

c'12 

else  continue 

c’13 

end 

c'14 

t :• 

t*  1 

c'lS 

t" 

* t"  + 1 

c'16 

t" 

X 

t 

:«  u 

c'17 

i :• 

i 1 

c'18 

if 

v^  ■ 1 

c'19 

then 

c'20 

begin 

c'21 

for  j»l  Step  1 until  i 

do 

c'22 

begin 
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0 
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c'23 


if 


, t-j+1  t. 


c'24 

then 

c'2S 

begin 

c'26 

t"  :*  t"  f 1 

c'27 

t"  „t-i 

X 

c'28 

end 

c'29 

else  continue 

o 

o 

end 

' c'31 

for  j=l  Step  1 until 

c'32 

begin 

1 c'33 

t"  :=  t"  + 1 

1 

c'34 

t"  t 

X :=  a. 

J 

c'3S 

i :=  -1 

c’36 

end 

c'37 

end 

c'38 

else  continue 

c ' 39  go 

to  c'3 

and  = 3^'^ 


or  a 


t-j 

^t-j 
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0 

I 


I 


1 

f 


d'l 

Set 

Decoding  Algorithm 

the  integer  variables  t and  t"  to 

d'2 

Set 

the  integer  variable  i to  -1 

d'3 

Set 

the  binary  variables  v^  , i ^ 1 to 

d'4 

t" 

* t"  + 1 

d'S 

For 

j :»  0 Step  1 until  i do 

d'6 

begin 

d'7 

-«t-i  + l -t.  „t-j 

d'S 

then 

d'9 

begin 

d’lO 

t"  t-i 

if  =«  S ^ 

d'll 

then 

d'12 

begin 

d'13 

i j-1 

d'14 

t"  :=  t"+l 

d 15 

end 

d'16 

else 

d'l7 

begin 

d’18 

if  x^"  = 

d'19 

^-j 

then 

d'20 

begin 

d'21 

i'22 

t : t-j 

d'23 

i :»  -1 

d’24 

t"  :»  t 

d'25 

end 

I 


d'26 

d'27 

d'28 

d'29 

d'30 

d’31 

d'32 


t :*  t>l 


i i+1 


else  continue 
end 
end 

else  continue 


72 


d'33  go  to  d'4 


73 


Note  that  these  algorithms  are  simpler  than  those  given  in  Section 
2,  and  the  flags  are  inserted  and  received  in  order.  This  explains  the 
role  of  i : if  the  last  flag  or  Insertion  were  sent  at  time  t-i  , it 
is  useless  to  search  for  a root  in  lines  c'4,  c'15  and  d'6  past  t-i+1. 
Thus  the  presence  of  i limits  the  scope  of  the  search,  and  makes  sure 
that  at  most  one  insertion  occurs  for  each  flag. 

We  will  now  look  at  conditions  for  classes  to  be  allowable.  With- 
out precautions,  problems  can  arise  in  two  cases  because  part  of  a flag 
may  be  misinterpreted  as  another  flag  or  an  insertion 
Case  a) 


4*^ 

i 

. c , X.  t , , t+i  t+i 

If  (a.^^  . •••.a, 

t+i 


,t-t-i. 


. or  S ) , i > 1 , i+v^  . 

t+i  j ^ 

t + i 


the  receiver  may  detect  that  a flag  has  been  used  at  time  t+i  , or  B 
when  the  flag  is  used  at  time  t . The  same  problem  occurs  in 
Case  b) 


Ok^*' 

0(‘*\ 

< 

ft  t , t-i  t-i  a t-i.  • ^ 1 1 

-i^  “ •••  »“\j  orS  •-  ) , 1 > 1 , 1 < V -1  < V 


If  the  flag  compositions  are  such  that  these  cases  never  arise,  all  flags 
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and  insertions  will  be  correctly  recognized,  and  thus  also  the  data.  So 
a class  is  allowable  if  and  only  if  cases  a)  and  b)  cannot  occur.  From 
this  we  proceed  to  prove  three  results: 

1)  We  give  explicit  conditions  for  a class  containing  only  one  flag 
to  be  allowable,  and  we  determine  the  number  of  such  classes. 

2)  We  show  that  no  class  containing  a flag  of  length  1 and  another 
flag  is  allowable  if  d=2  , but  many  e.xist  if  d > 2 . 

3)  We  prove  that  if  d=2  there  are  only  two  kinds  of  allowable 
classes  containing  a flag  of  length  2 . 

To  derive  the  first  result,  we  note  that  if  only  one  flag  (say 
(a^  ...  a^)  with  S being  the  possible  insertion)  is  allowed  in  a 
class,  situation  a)  never  occurs  while  situation  b)  will  not  occur  for 
j £ 2 or  if  the  following  j-2  inequalities  are  verified 


....  ^ 

(a^  , . . . , a^  or 

8) 

- 

■ °‘2'-' 

(a,, . . , a.  or 
a’  ’ j 

3) 

- "*2^ 

(a . , , a . or 
J-1  J 

3) 

(4) 

The  i^^  condition  may  be  interpreted  \s  "the  flag  does  not  have 
period  i"  , because  if  it  is  not  true,  a^  = ^1+i 

a,  = a_  . 

2 2-t-i 

a.  . * a.  or  8 
J 

A flag  satisfying  all  these  conditions  will  be  called  strong;  another 
flag  will  be  called  weak.  To  check  if  a flag  is  strong,  it  is  enough 

conditions,  for  if  a flag  has  period  i , 


to  check  the  last 
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1 ^ i ^ [^J  - 1 , it  has  also  period  mi  for  some  mi  e { [^J , ...  , 

i-3.  j-2}  . 

It  is  of  academic  interest  to  know  the  number  of  strong  flags  of 


possible  roots 


length,  j , We  will  compute  how  many  among  the  d^"^ 
satisfy  once  and  S have  been  chosen. 

For  pedagogical  reasons  we  start  by  determining  the  number  of 
roots  such  that  ' 

Ca^  , ...  , a.)  / (a^_^  , ...  , aj_^)  , i=l,2,...j-2  (5) 

These  roots  will  be  called  strong  and  their  number  denoted  by  d^~^Y(j-l); 
the  other  roots  will  be  called  weak.  If  a root  is  weak,  let  i be  the 

— ° ri-2i 

least  i for  which  C3)  does  not  hold.  Then  as  we  have  seen,  i^  £ I 2~r 

Then  (‘^..,...,3.  ) = (ct.  . ,...,a.  ) and  is  a strong  root  of  a flag  of 

0 ^’^0  j-l-2i 

length  i^+1  . For  every  such  root  there  will  be  d distinct 

weak  roots  of  length  j-1  (_a^  ^ ..,a.  ^ can  take  all  possible 

^0  ~^o 

values).  Thus  the  number  of  weak  roots  of  length  j-1  is  equal  to 


d^'^ci  - yU-D)  = 2 dS(i)d^’^'‘^ 

i=l 


or 


Y W 


1 - Z 


i=l  d 

We  see  that  Y(2k)  = YC2k+l)  = 1 


y(i) 


C6) 


and  that  y is  a 


i=l  d" 

decreasing  non  negative  function  of  k , thus  it  has  a limit,  y C”)  say, 
as  k increases.  We  will  bound  yC'”)  > ^tid  show  that  it  is  positive. 
From  (6)  one  finds 

Y(4k>2)  = 2 - (1  * i)  Z 

i=0  d ^ 


with.  yCO)  :=  1 
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so  y(-)  = Y(‘»k^2)  - Cl  * Z 

i=k+l  d 

and  because  y is  a decreasing  function 


y(-)  < Y(4k+2)  - Cl  + j)  y(«)  • 2 -|r 

i=k+l  d 


Thus 


Y(4k>2)  - 


1 


Cd-1)  d 
1 


2k+l 


Y(2k+2) 

YC2k+2)  < yC“)  ^ YC4k+2) 


2irr~  YC4k*2] 

(d-l)  +1 

In  particular,  for  k»0  , using  the  fact  that  yC2)  =1-4 


(7) 


« , 11  ,,,11,,  1 
0 < 1 - _ - _ < y(=o)  < 1 - ^ _ Cl  . 

d d d -d+1 


These  bounds  are  extremely  tight  for  d >>  1 . 

IVe  are  grateful  to  Prof.  Massey  for  pointing  out  that  (Nielsen, 
I973J  obtained  by  a similar  method  but  in  a different  context  the 
same  expression  for  yCI)  . and  the  same  lowerbound  for  yC*3  • A 
strong  root  is  called  there  bifix-free.  Tables  of  numerical  values 
are  also  given;  in  particular  for  d=2  , 
y(0)  » 1 


y(2)  - .5 
y(4)  = .375 
y(6)  = .3125 

y(8)  » .2881 

y(-)  « .2678 
.2675  < y(”)  < .2690 


whereas  from  ( 7 5 with  k“2 


77 


Now  that  we  have  seen  the  mechanics  of  the  proof,  we  attack  the 
problem  of  finding  the  number  of  strong  flags  of  length  j , terminating 
with  a given  and  using  a given  3 as  possible  insertion.  We 

define  this  number  as  • 

If  a flag  is  weak,  there  is  a smallest  i e{2,3,...,l  + } 

such  that 


(a  = (a.  , a.  or  S) 

1 ^ j + l-i’  ’ j ‘ 

On  the  other  hand  from  every  strong  flag  of  length  i one  can  build 
i • 2i 

2 d-'”*  distinct  weak  flags  of  length  j L > say  (a^', . . . ,cij) 

by  choosing  aJ  = a,  ,.,.,o!  = a.  or  3 , a.'  . , = a.,...,  al  = a.  , 
and  choosing  ^ arbitrarily.  From  every  strong  flag  of 

length  i such  that  = a^  , one  can  obtain  the  weak  flag 

. . . ,a^_j^,a^,a2, . . . ,a^)  of  length  2i-l  . Noting,  by  induction 
on  i , that  the  fraction  of  strong  flags  of  length  i that  have 
a^^  » a^  is  2/d  , we  can  write  in  general 


‘-L¥J 


thus 


- 5(3))  - I 2 dj'^^  d^'^6(i) 
i»2 

Til  . 

6(3)  » 1 - 2 Z d’^5(i) 
i»2 


As  was  the  case  for  y , 5 is  a non  increasing  function  of  i : 

2 ^ -i 

5(23)  ” 5(2j-l)  *l+j-2  1 d ^5(i)  where  5(1)  :»  1 . 

i»l 


We  can  thus  write 

5(4k+3)  . 1 * I - 2(1  + i)  I d‘^^^''^^5(2i  + l) 

i=0 
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and 


lim  5(i)  = 5(4k+3)  - 2(l+i)  E d' 5 (2j + 3)  Vk^  0 
i-«»  i=k+l 


As  before  this  pennits  bounding  5(“>)  : 


5(4k*3)  - 5(2k^3)  < 6 C“)  < 5C4k+3) 

d*-*^  Cd-l)/2 


1 

1 + d^'^'’^Cd-l)/2 


• 5(4k+3)  k^  0 

Using  the  fact  that  5(3)  = 1 - 2/d^  : 

(1  - %)  (1  - ) < 5 (oo)  < (1  - Ij)  (1 J ) 

d^id-l)/!  1 V d^(d-l)/2 

Of  course  for  the  binary  case  5(i+l)  = yCi)  • 

This  concludes  the  analysis  of  classes'  containing  only  one  flag. 
To  show  the  second  result  mentioned  above,  note  that  situation  a) 
cannot  be  avoided  if  d=2  and  if  the  flag  used  at  time  t has  length 
j > 1 while  the  flag  used  at  time  t*l  has  unit  length.  On  the 
other  hand,  if  d=3  and  the  flags  are  0,  02,  022  etc.  with  "1" 
being  eventually  inserted,  situations  a)  and  b)  never  occur. 

We  prove  now  the  third  result:  suppose  that  d=2  and  that  a 
class  contains  a flag  of  length  2 and  other  flags.  If  the  root  of 
the  flag  of  length  2 is  "0"  , situation  a)  is  avoided  only  by 
having  all  symbols  in  the  other  flags,  except  the  first  and  the  last, 
be  equal  to  "1"  . Because  the  flags  must  be  strong,  the  first  symbol 
must  be  different  from  the  penultimate.  Thus  we  conclude  that  the 
root  of  a flag  must  have  the  form  (0,1,1,...,!)  . If  the  root  of 
the  flag  of  length  2 is  "1"  , the  same  conclusion  arises,  with  all 
"0"s  replaced  by  "1"  , and  conversely.  In  both  cases,  the  last 
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symbol  can  be  chosen  freely  independently  in  all  flags.  One  checks 
that  these  classes  are  allowable. 


Chapter  4 


80 


Encoding  Message  Starting  Times 


1 . Introduction 

We  consider  in  this  chapter  a seemingly  trivial  problem,  which 
was  mentioned  briefly  in  Section  2 of  Chapter  2 . We  will  not  be  able 
to  solve  it  completely,  but  we  will  gain  some  insight  into  the  peculia- 
rities of  making  information  theoretic  and  coding  theoretic  statements 
in  a queuing  environment . 

The  model  is  the  following;  an  asynchronous  nemoryless  station- 
ary source  emits  messages  which  are  stored  in  an  infinite  buffer  and 
transmitted  over  a noiseless  synchronous  binary  link  with  a capacity  of 
1 bit  per  unit  of  time.  We  assume  that  the  interemission  times  and 
message  lengths  are  mutually  independent  random  variables  and  that  each 
message  contains  a "codeword"  indicating  its  length.  By  this  we  mean 
that  if  the  receiver  knows  when  a message  starts  it  will  be  able  to 
detect  the  end  of  the  message  from  the  information  provided  by  the  mes- 
sage itself.  This  can  be  done  by  prefixing  a message  with  a codeword 
indicating  ics  length,  or  by  using  flags  as  explained  in  Chapter  3 , or 
simply  by  using  messages  that  are  codewords  from  a prefix  condition  code, 
as  in  Chapter  2 . We  denote  an  interarrival  (service)  time  by  a (b) 
and  by  A (B)  its  probability  distribution  function,  and  assume 


Ea  > Eb  > 0 . 
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2.  Discussion  of  the  Problem 

Because  the  arrivals  and  lengths  are  random,  it  may  happen  that 
the  buffei  becomes  empty.  The  line  being  synchronous,  something  must 
still  be  sent  out,  and  the  receiver  must  be  able  to  distinguish  these 
idle  bits  from  the  data  bits.  From  another  point  of  view,  this  is  equi- 
valent to  recognizing  when  the  line  becomes  busy,  i.e.  detecting  the 
message  starting  times.  There  are  many  possible  strategies  to  do  this, 
the  most  obvious  one  being  to  transmit  "0"  's  when  the  line  is  idle,  and 
prefix  every  message  with  a "1"  . Naturally  one  asks  which  is  the 
"best"  strategy.  We  should  first  agree  on  the  meaning  of  "best." 

If  we  define  as  protocol  bit  a bit  which  is  not  a message  bit  Cin 
the  previous  e,xample,  the  idle  bits  "0"  and  the  prefix  bits  "1"  would  be 
the  protocol  bits) , it  seems  reasonable  to  find  the  strategy  which  mini- 
mizes the  average  number  of  protocol  bits  per  message,  i.e.  the  limit 
Cif  it  exists  and  is  constant  with  probability  one)  as  the  time  goes  to 
infinity  of  the  number  of  protocol  bits  sent  to  the  number  of  message 
arrivals.  Unfortunately  this  criterion  is  most  useless,  for  all  strate- 
gies resulting  in  a stable  system  have  the  same  average  number  of  proto- 
col bits  per  message,  and  this  number  is  equal  to  Ea  - Eb  .This  is  so 
because  if  the  system  is  stable,  the  law  of  large  numbers  says  that  the 

average  total  number  of  bits  per  message  is  Ea  , and  Eb  of  these  are 

message  bits. 

This  result  is  thus  trivial,  although  surprising  at  first  sight. 
Its  information  thecretic  meaning  is  that  although  the  amount  of  infor- 
mation carried  during  an  idle  period  may  be  small,  it  cannot  be  encoded 
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efficiently.  To  give  more  sense  to  the  concept  of  protocol  bit,  we  can 
do  the  following:  suppose  that  we  have  at  our  disposition  an  infinite 
reserve  of  "low  priority"  bits  (this  could  represent  some  kind  of  service 
information)  that  we  can  transmit  when  we  wish.  Thus  there  is  no  reason 
for  the  line  to  be  idle,  but  we  may  still  need  protocol  bits  (defined  as 
bits  that  are  not  data  nor  low  priority  bits)  to  differentiate  between 
the  two  other  kinds  of  bits.  .Vote  that,  as  before,  for  a stable  system, 
the  expected  number  of  protocol  bits  per  message  plus  the  expected 
number  of  low  priority  bits  per  message  equals  Ea  - Eb  . We  can  now  ask 
the  question:  what  is  the  infimum  of  the  average  number  of  protocol  bits 
per  message?  The  answer  is  0 , and  this  can  be  approached  by  the  follow- 
ing strategy:  send  C (meant  to  be  large)  low  priority  bits,  then  a 
codeword  indicating  the  number  ,n  of  message  arrivals  since  the  last 
such  codeword  has  been  sent,  then  the  n messages.  Repeat  the  process. 

The  average  number  of  protocol  bits  per  message  will  be  equal  to  the 
expected  codeword  length  divided  by  En.  If  the  codewords  are  well 

chosen,  the  expected  codeword  length  will  be  smaller  than  (En+1)K(1/ (En+1)}+1 

the 

[Gallager,  1968,  p.  507],  thus/ average  number  of  protocol  bits  per 
message  is  smaller  than  (l+l/En)H(l^n+l))+l/En.  Clearly,  as  C goes  to  » so 
does  En  , thus  the  average  number  of  protocol  bits  per  message  goes  to 
zero.  The  drawback  of  this  strategy  is  that  the  average  message  waiting 
time  goes  to  infinity  as  Z increases. 

A meaningful  problem  would  thus  be  to  find  a coding  scheme  mini- 
mizing the  average  message  waiting  time  for  a given  average 

number  of  protocol  bits  per  message.  We  are  unable  to  solve  this  pro- 
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blem,  or  even  to  lowerbound  the  expected  waiting  time.  We  will  be 
content  to  study  the  following  class  of  flag  strategies: 


Ideally  we  should  let  this  scheme  be  adaptive,  i.e.  we  should 
allow  flag  and  low  priority  bit  sequence  lengths  to  be  functions  of  the 
times  of  reception  and  lengths  of  the  previous  messages,  flags  aind  low 
priority  bit  sequences.  This  is  known  to  the  receiver.  In  light  of  the 
results  of  Chapter  3 and  of  the  fact  that  this  scheme  sends  flags  when 
the  buffer  is  empty,  which  has  a favorable  influence  on  the  message 


We  have  thus  removed  much  of  the  variability  of  the  flag  and  low 


priority  bit  sequence  lengths,  allowing  only  the  length  ^ (b)  of  the  low 
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priority  bit  sequence  immediately  following  a message  to  be  a function 
length 

of  that  message /and  allowing  the  first  flag  and  low  priority  bit  sequence 
in  an  idle  period  to  be  different /ran the  others  in  the  idle  period.  We 
assume  that  with  probability  one  a message  is  longer  than  Max  ’ 

otherwise  some  messages  cannot  be  considered  as  being  received  when  they 
are  fully  transmitted! 

To  be  able  to  obtain  analytical  results  we  will  also  model  the 
arrival  process  as  Poisson.  The  analysis  will  proceed  in  steps;  in 
Section  3 we  will  study  a general  queueing  model  whose  parameters  will 
be  identified  in  Section  4 so  that  it  represents  the  flag  strategy  we 
want  to  examine.  The  main  results  will  be  given  in  Section  5,  while  the 
optimal  function  will  be  looked  at  in  Section  6.  We  will  give 

numerical  results  in  Section  7. 
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3.  M/G/1  Queues  with  Overhead 

A.  In. reduction 

We  analyze  here  the  following  problem:  arrivals  in  a queue  follow 
a Poisson  process  with  rate  X:  * 1/Ea  and  the  service  times  have 
distribution  B'  . The  first  customer  in  a busy  period  suffers  an  extra 
delay  with  distribution  , while  the  services  of  the  other  customers 
are  increased  by  a random  amount  with  distribution  . We  assume  that 
the  interarrival  times,  service  times  and  extra  delays  are  all  indepen- 
dent. We  will  study  the  stationary  distribution  of  the  number  of 
customers  in  the  queue,  the  mean  waiting  time,  and  the  joint  distribution 
of  the  busy  period  length  and  of  the  number  of  customers  in  the  busy 
period . 

B.  Stationary  Distribution  of  the  Number  of  Customers  in  the  Queue 

Let  x^  be  the  number  of  customers  in  the  queue  right  after  the 

n^^  customer  has  left  the  system  and  let  IT^  be  the  probability  mass 

function  of  x^  . We  have  the  following  recursive  relation  between  the 

's:  “ *n-l  * (number  of  arrivals  during  n^*^  service)  - I 

th  ^n-l>0 

It  is  well  known  that  the  number  of  arrivals  during  the  n service  has 
a generating  function  equal  to  FJ (X-Xz) B' * (X-Xz)  or  F*  (X-Xz) B'*  (X-Xz) , 
depending  on  whether  or  not  x^_j^  » 0 . Denoting  by  IT*  the  z -transform 
of  , we  have  immediately 

ii;(2)  - n*_^(0)  Fj(x-xz)  B'*(x-xz)  + cn;.;^C2) 

F*(X-Xz)  B'*(X-Xz)  j 
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By  classical  methods  Karlin,  1975,  pp.  96-102j  one  sees  that  the  system 
is  ergodic  if  and  only  if  X(Eb  + Ef^)  < 1 ; in  this  case  the  z trans- 
form n*  of  the  stationary  distribution  IT  must  be  equal  to 


n*co)  b'*(x-xz)(2  f*ca-xz)  - f*cx-xz)) 

z - F*(X-Xz)  B'*CV^z3 


and  II*(1)  must  equal  1 , so,  using  L'Hopital's  rule  as  z -*•  1 , 


thus 


n*(0) 

II*  C:) 


1 - XE  Cb'+f2) 
1 + XECf^.f^) 


Cl  - XECb'+f^)) 


B'*(X-Xz)  F*(X-Xz) Cz-1)' 
z - F*CX-Xz)  B'*(X-Xz) 


^ z F*CX-Xz3  - F*CX-X2)' 

1 * XECf^-f^}  F*(X-X2)(z-1) 


If  “ ^2  ’ second  factor  in  brackets  ie  equal  to  one,  and  we 
obtain  the  Pollaczek  formula  for  M/G/1  queues  with  service  distribution 


F*  B'  . 

n is  also  the  stationary  distribution  of  the  number  of  customers 
in  the  queue  at  an  arrival  time  [ Kleinrock,  1975,  p.  176 J,  and,  because 
the  arrival  process  is  Poisson,  also  at  a random  time. 


C.  Average  Delay 

Combining  the  remark  at  the  end  of  the  last  section  with  Little's 
formula  [little,  196lj,  one  obtains  by  differentiating  II* Cz)  the  fol- 
lowing formula  for  the  average  message  delay,  where  the  delay  is  defined 
as  the  difference  between  the  times  of  service  completion  and  message 


arrival : 


s 
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E(d) 


Eh'  * E£.  * H 


X(Eb'^  + 2Ef^  Eb'  + Ef^) 


1 - X(Eb'  + Ef^)  " 1 + ACEf^-Ef^] 


((Ef^  - Ef2)(l  - XEf2)  + XCEfJ  - E£^)) 


X(Eb'^  + 2E£2  Eb  + ££2)  ^ 

' + H ' — ■'■  ' '»  -nsi  ."'  -.  ^ '».  + ■■-  ...'.  .1  ■ 


Eb'  + 


1 - XTeTj'"  + ££2')  1 + X(Ef^  - £^2^ 


(E£^  + H X(Ef^  - ££2)) 


( 1) 


D.  Busy  Periods 

Denote  by  g and  m respectively  the  length  o£  and  the  number  o£ 
customers  served  in  a busy  period.  We  will  characterize  the  £unction 
GM*(s,z):  = E[z'"e‘^®]  . 

It  is  well  known  [Kleinrock,  1975]  that  i£  " ^2  * GM*(s,2)  , 
theredenoted  GM* (s ,z)^  satis£ies  the  relation 

GM*(s,z)  = zF*(s  + X - XGM*(s,z))  B'*(s  X - XGM*(s,z)]  . 

3L  Z dL  E 

We  will  express  GM*(s,z)  in  terms  o£  GM*Cs,z)  as  £ollows:  let 
b^  and  £j^  be  the  lengths  o£  the  £irst  service  and  extra  delay,  and  n^^ 
be  the  number  o£  arrivals  during  bj^  and  £j^  . We  then  have 

E(2™e'®«|b^,£j,n^]  = ze*® (MG*  (s,z))"^ 

because  the  n^  arrivals  will  generate  n^  independent  busy  periods 
characterized  by  GM^  , Averaging  on  n^ , bj^,  and  £j^,  one  obtains 

GM*(s,z)  =1  : F*(s  * X - XGM*(s,z))  s'*  (s  4.  X - XGM*(s,z)) 

FjCs  ♦ X - XGM*(s,z)) 

* F*(s  * X - XGM*(s,z)) 
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(2) 


4 . Identification  of  B',  and 

In  the  analysis  of  the  queueing  system  of  the  previous  section  we 
have  obtained  expressions  that  involve  the  Laplace-Stielt jes  transform 


B'*,  FJ  and  F*  . We  will  identify  them  from  the  previous  description 
of  the  coding  scheme. 

B'  will  be  the  probability  distribution  function  of  b':  = 
b CqGj)  , i.e.  b'  is  the  sum  of  the  lengths  of  a message  and  of  the 
low  priority  bit  sequence  that  immediately  follows  it. 

£2  will  correspond  to  the  extra  delay  for  a message  in  the  middle 
of  the  busy  period.  In  our  scheme,  f_  will  be  equal  to  0 or  1 , 

-('1-1} 

depending  on  whether  or  not  an  insertion  is  needed.-  So  F*  = 1 + 2 


(e  - 1)  if  the  first  v^-l  bits  of  all  messages  are  equally  likely  and 
r-*  » ^^1  rxi 


It  is  harder  to  compute  F*  . We  start  by  solving  the  following 
Let  ^ 

problem./  the  times  0,tj^,t2,...  form  a renewal  process,  the  probabi- 
lity distribution  function  of  tj^  being  (CjC0')=0)  , and  the  distri- 
bution of  t^  - tj^_^  being  C2  (C2(0')=0)  , i-2,3,...  At  a random 

time  t , independent  of  the  renewal  process  and  with  distribution  function 
1 - , t 0 , a "supervent"  occurs.  We  wish  to  find  the  Laplace- 
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Stieltjes  transform  of  the  distribution  of  the  random  variable  fj^ 

defined  as  follows: 


f;  . min  * ‘'2'ctj>t) 

t_>t 


where  dj^  and  d^  are  random  variables  independent  of  the  renewal  process 
and  of  t , with  distributions  and  D2  respectively.  In  other  words, 
fj^  is  equal  to  the  time  between  the  occurrences  of  the  superevent  and  the 
following  event,  plus  a random  variable  whose  distribution  is  if  the 
superevent  occurs  before  the  first  event,  and  D2  otherwise. 

We  have  immediately: 


-s(t,-t) 

F*(s)  = Pr(tj<  t,J  E[e  |t  1 t^]  DjCs) 


-s  min  (t  -t) 

BrCt  > tj)  E[e  n " |t  > t^]  D*Cs) 


( 4 ) 


We  compute  now: 


t >t 
n— 


Pr  (t  > t^)  . /^  e " d C^(t^)  = C*a) 


( 5 ) 


-s(t.-t) 

E[e  It  < t^] 


1 n /I  , .-Xt  -S(tj^-t) 


1 - c*cx) 


/ dC- (t.)  / " dt  Xe  " e 
olio 


^ c*Cs)  - C*CX) 

rr  1 - d*cx) 


Similarly,  because  t is  "memoryless ," 


E[e 


‘n-1  < ' 1 > n 


C*(s)  - C*(X) 

xTF  1 - c*(X) 


The  right  hand  side  member  is  independent  of  n , given  n > 1 ; thus 


91 


-s  min  (t  -t) 


E[e  ^ 


h > = 


^ c*Cs)  - C*CX) 

FT  1 - c*(X) 


Plugging  these  results  into  ( 4 ) , we  obtain, 


eta) 


FJ(s)  = ^ [(C*cs)  - qcx))  D*Cs)  . Y-rr^  fc*(s) 


- c*a))  D*(s})] 


and  by  differentiation. 


1 


C*(X) 


“l  ■ - r * ECj  • 1 : c.ft)  Ec2  ‘ Cl  - C;(X))  Edj  . ejex)  Edj 

c*cx) 

- ^^1  ■ 77  - r EEl  • ECj  * 2E=jEd,  . , (Eo^  . EEcjEd^) 


+ (1  - C*(X))  Ed^  + C*CX)  Ed^ 


We  will  use  the  fact  that 


crex) 


2 0 ''I  2 

Ef^  . I^XEf^  = X[Ec^  > 2Ec,Ed^  . ^ ^Ec^Ed^) 

+ (1  - ejex))  Ed^  + C*(X)  Ed^  ] 

We  will  need  later  the  fact  that 


Pr(t>t^)  = C*CX)(C*CX))"‘\  n=l,2,...  (6) 

for  lhe  same  reason  a^  (5  ) . 

In  our  coding  scheme,  the  "superevent"  will  be  the  arrival  of  a 

message.  will  correspond  to  the  distribution  of  the  flag  of  length 

-sv. 

V.  plus  the  low  priority  bit  sequence  of  length  Z- • Thus  C*  » e ^ 

^ 1 i 

e i"l,2  . dj^  and  will  be  equal  to  zero,  except  if  an  insertion 


I 
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is  needed  in  the  first  message  of  a busy  period,  so  D*  = D*  = 

-V2>1  .3  2 

1+2  (e  -1)  , and  Ed^  = Ed2  = Ed^  = 2 , under  the  usual 

assumptions . 


Thus, 


Ef^  = -lA  + + 


1 - e 


-(V2-I3 


( 7 ) 


Ef^  + i Ef^  = % X((v^  + ^^3“  * 2(v^  + q3  2 ‘ 

ACUi+Ci3 


-(V2-I3 


1 - e 


- -Cv,-13 

^ '^^2-^^  ( 8 3 


5.  Main  Result 


Putting  together  all  the  results  of  the  previous  sections,  we 
obtain  a formula  for  the  average  message  waiting  time  as  a function  of 


Co(b3  > • ^2  ’ ^1  '^2  ■ Cl).  C33  , (73  and  (83 

-(V  -13  -(V  -13 

A (2  +22  (Eb  + EC  3 + E b'  3 

r. . _ I,  0 


- (v-i  -13 

l-X(Eb  + EC+2  3 

° -(V  -13  -^Cv^-Cj) 

1 - e ^ ^ 


A(v^+Cj) 


C-  + V.  + 

i i 


1 - e 
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2 -K-i)  -(y.-i) 

■^Tv^3  -(v.-i) 

2 ^ . 2 ^ 


-Cv,-1) 
2 ^ 


(9) 


We  also  obtain  the  average  number  of  low  priority  bits  per  mes- 
sage, which,  after  a little  moment  of  reflection,  must  be  equal  to 

pf  ^ E (number  of  low  priority  bits  in  an  idle  period) 

0 Em 

where  m has  been  defined  as  the  number  of  messages  in  a busy  period. 
By  substituting  ( 3 ) and  ( 7 ) in  ( 2 ) one  obtains 


Em 


-CVj-l)  -(Uj-I) 

•'’i ^ - 2 ‘ ) 

1 - e ^ ^ 



l-X(Eb  + E5Q  + 2 ^ ) 


In  the  parlance  of  Section  4 , the  number  of  low  priority  bits  in  an 
idle  period  is  equal  to  ♦ i if  superevent  occurs  between  t^^ 
and  t^^j  . Thus  its  expected  value  is  equal  to 


-X(v^.Cp 


^1  ^ 


1 - e 

as  can  be  seen  by  using  ( 6 ). 

The  expected  number  of  low  priority  bits  per  message  is  thus  equal  to 


^1  ^ 


-XCv.^CJ 
e ^ ^ 5. 


1 - e 


(J-  - Eb  - E£ 


-(''i-D 
2 ^ ) 


'^0  * 


CIO) 


1 - e 


-VTvV'Tp — * ^ 2 
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What  is  left  to  do  is  to  try  to  minimize  E(w)  on  , 

and  V2  while  keeping  the  expected  number  of  low  priority  bits  per 
message  fixed.  In  the  next  section  we  will  gain  some  insight  into  the 
problem  of  optimizing  on  for  fixed.  This  will  reduce  the 

problem  to  optimizing  on  » ^2  ' '^l  ^^2  ' '^^ich  will  require 

numerical  computations. 

6.  Optimization  of  C^Cb) 

We  decided  to  let  the  length  5^  of  the  low  priority  bit  sequence 

following  a message  be  a function  of  the  length  b of  this  message. 

Denoting  b + CqOj)  by  b'  , we  see  from  formulas  ( 9 ) and  (10  ) that 

2 

the  average  message  delay  depends  on  £ b'  and  E b'  while  the 

expected  number  of  low  priority  bits  per  message  depends  on  E b'  . The 

question  then  arises  of  how  C^Cb)  should  be  defined  so  as  to  minimize 
2 

E b'  for  given  E b'  and  B . 

We  will  solve  this  problem  for  the  case  where  “ay  take  non 

integer  values.  This  will  give  some  insight  and  a lower  bound  for  the 
interesting  case  when  takes  only  integer  values,  which  is  an  infinite 

dimensional  non  linear  integer  programming  problem. 

We  must  find 

min  * 5.,Cb))^  dB(b) 

5,(b)  ° 

subject  to  the  constraints: 

Co(b)  > 0 

/“  Cg(b)  dB(b)  i E(Cg)>  0 


( 11  ) 
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We  start  by  defining  a as  the  only  root  of  the  equation 
/ I (x-b)  dB(b)  - EcCg) 

This  root  exists  and  is  unique  and  positive  because  the  function 

(x-b)dB(b)  is  continuous  (left  and  right  derivatives  exist  every- 
vhere) , is  equal  to  0 at  x*0  , is  increasing  if  it  is  not  equal  to 
0 , and  goes  to  » as  x increases. 

We  have 

^ Cb+CgCb))^  dB(b)  + /“(b+CgCb))^  dB(b) 

i /JCb+CoCb))^  dB(b)  + /"  b^  dB(b)  + 2a  /*  ^oCb)  dB(b) 
by  non  negativity  of 


c/^  b . c cb)  dBtb))2  ^ 

> _2__ 2 + r dBCb)  + 2a  / £ (b)dBCb) 

r dB(b)  “ “ ° 


by  the  SchWarj  inequality 


(a  dB(b)  - EC£„)^  £ (b)  dB(b))2 


0-'  0 0 

dB(b) 

♦ 2a  /*  £^(b)  dBCb) 


+ /“  b^  dB(b) 


by  the  definition  of  a 


(a  dBCb)  - / £ Cb)  dBCb))2  ^ 

2 / b^  dBCb) 

dBCb)  “ 

♦ 2a  S^Cb)  dB(b) 


> a^  dB(b)  - 2a  /*  £^(b)  dB(b)  ♦ dB(b) 
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♦ 2a  /“  CgCb)  dB(b) 

- a^  dB(b)  + /“  dB(b) 

This  lower  bound  is  achieved  if 

b + (b)  » ( a b ^ a 

[b  b > a 

i.e.  if  5^Cb)  » ^ o-b  b ^ a 

|_0  b > a 

This  satisfies  (11)  with  equality  because  of  the  definition  of 

a and  is  thus  optimal.  This  result  is  intuitively  pleasing. 

As  noted  above,  the  constraint  that  must  have  integer 

values  makes  the  problem  much  more  difficult,  except  if  a happens 
to  be  an  integer.  In  general,  constraint  (11)  will  not  be  satisfied 
with  equality  by  an  integer  solution  if  Z (b)  is  a deterministic 


function  of  b . 


1 


7.  Numerical  Results 
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Many  of  the  results  of  this  chapter  have  limited  practical 
interest.  This  is  due  to  the  fact  that  generally  there  are  no  low 
priority  bits  to  be  sent.  However,  the  analysis  of  the  previous  section 
is  relevant  as  far  as  the  use  of  flags  is  concerned.  We  will  briefly 
consider  how  the  flag  lengths  should  be  chosen  to  minimize  the  average 
waiting  time  when  no  low  priority  bits  are  sent.  (Formula  (9)  with 

'o  • h ■ '2  ■ '>■> 

We  recall  that  we  use  a flag  of  length  to  indicate  the  end 

of  a busy  period,  v^diile  flags  of  length  are  sent  during  the  residue 

of  the  idle  periods. 

From  numerical  computations  it  appears  that  the  choice  ^2^*^ 
never  worse  than  is  in  fact  optimal  in  light  traffic.  In 

heavy  traffic  the  second  flag  is  rarely  used,  so  its  optimal  length 
increases  somewhat  to  reduce  the  probability  of  an  insertion  in  the  first 
message  of  a busy  period.  The  effect  on  Ew  is  relatively  negligeable, 
as  illustrated  in  Table  4.1. 

''2*^  '^2*^ 

XEb  ■ .5  ■ 3 

Ew  ■ . 1.739  1.733  1.975 

XEb  ■ .95  ■ 9 

Ew  ■ 80.971  80.724  80.646 

Table  4.1 

Influence  of  on  Ew 
Eb^»64 


Eb-8 
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The  situation  is  much  more  complicated  as  far  as  v.  is  concerned. 

-(V  -1) 

The  presence  of  the  expressions  2 Eb  and  X2  respectively 

in  the  numerator  and  denominator  of  the  first  term  of  the  right  hand 
side  member  of  makes  the  optimal  an  increasing  function  of  Eb 

and  \ . Contrary  to  the  case  of  ^2  quite  sensitive  to  the  value 

of  V,  , especially  when  the  load  is  heavy; 

I 

Eb^  - 64 

for 
for 

We  illustrate  in  Tables  4.2  and  4.3  the  behavior  of  the  optimal 
value  of  as  the  load  increases  for  two  different  message  length 
statistics. 


a) 

Eb  > 8 

Eb^ 

a 64 

b) 

Eb  » 5 

Eb^ 

• 30 

The  first  case  represents  the  transmission  of  single  characters  without 
special  source  encoding,  whereas  the  second  is  representative  of  the 
message  length  statistics  when  some  source  coding  (see  Chapter  II)  is 
performed.  .Vote  that  we  did  not  take  into  account  the  effects  that  occur 
when  flags  are  longer  than  messages. 

We  do  not  give  examples  with  larger  average  message  length: 
except  in  very  heavy  traffic  the  improvement  in  performance  brought  by 
the  use  of  optimal  length  flags  do  not  warrant  the  increased  complexity. 
It  seems  more  sensible  to  send  "0"  's  when  the  line  is  idle,  and  to 
prefix  every  message  with  a "1". 


If  Eb  - 8 

V2  . 2 

AEb  * .95 
then  Ew  ■ 92.89 
» 80.72 


Table  4.2 
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Optimal  as  a Function  of  the  Load 
Eb-8  Eb^»64  ^2*2 


X/Eb 

optimal 

'^l 

Ew  for 
optimal 

Ew  for 

Vi-2 

Ew  for 
v^-10 

.05 

3 

1.73 

1.74 

1.95 

.10 

3 

1.99 

2.01 

2.43 

.15 

3 

2.28 

2.31 

2.92 

.20 

3 

2.61 

2.65 

3.43 

.25 

3 

2.97 

3.04 

3.97 

.30 

3 

3.40 

3.50 

4.55 

.35 

3 

3.89 

4.02 

5.18 

.40 

4 

4.45 

4.65 

5.87 

.45 

4 

5.11 

5.41 

6.64 

.50 

4 

5.90 

6.33 

7.52 

.55 

4 

6.87 

7.50 

8.55 

.60 

4 

8.09 

9.00 

9.79 

.65 

5 

9.66 

11.02 

11.34 

.70 

5 

11.69 

13.88 

13.36 

.75 

5 

14.55 

18.23 

16.13 

.80 

6 

18.81 

25.67 

20.24 

.85 

6 

25.83 

41.26 

27.01 

.90 

7 

39.71 

94.71 

40.48 

.95 

9 

80.72 

00 

80.84 

Ted^le  4.3 
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Optimal  as  a Function  of  the  Load 
Eb-5  Eb^»30  v^.2 


X/Eb 

optimal 

'^l 

Ew  for 
optimal 

Ew  for 
V^=2 

Ew  for 

Vj^«10 

.05 

3 

1.69 

1.69 

2.05 

.10 

3 

1.89 

1.90 

2.59 

.15 

3 

2.12 

2.14 

3.13 

.20 

3 

2.38 

2.41 

3.66 

.25 

3 

2.67 

2.72 

4.20 

.30 

3 

3.00 

3.09 

4.74 

.35 

3 

3.39 

3.52 

5.30 

.40 

3 

3.85 

4.04 

5.89 

.45 

3 

4.39 

4.66 

6.52 

.50 

4 

5.03 

5.44 

7.22 

.55 

4 

5.79 

6.44 

8.02 

.60 

4 

6.75 

7.76 

8.97 

.65 

4 

7.99 

9.60 

10.15 

.70 

5 

9.62 

12.30 

11.67 

.75 

5 

11.84 

16.71 

13.75 

.80 

6 

15.19 

25.17 

16.83 

.85 

6 

20.61 

47.92 

21.91 

.90 

7 

31.26 

321.00 

32.02 

.95 

9 

62.36 

00 

62.42 
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1 . Introduction 

In  the  previous  chapters  we  have  examined  ways  to  encode  the 
message  contents  and  lengths,  and  to  differentiate  between  idle  and 
message  bits.  We  will  study  here  how  to  encode  message  origins  and 
destinations  in  a simple  case.  The  model  is  as  follows: 


Figure  5.1:  The  Model 


Messages  are  sent  from  the  asynchronous  sources  , i=l,2,..m  , to  a 
concentrator  containing  an  infinite  buffer.  From  there  they  are  trans- 
mitted over  a noiseless  binary  synchronous  link  to  a "deconcentrator" 
which  sends  the  messages  to  their  destinations,  , i*l,2,...n  . We 
observe  that  in  general  the  destinations  must  be  indicated  by  the  sources 
to  the  concentrator,  the  origins  and  destinations  must  be  indicated  to 
the  deconcentrator,  while  the  origins  alone  need  to  be  indicated  to  the 
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receivers . 

To  simplify  the  model,  we  can  associate  a virtual  source  and 
receiver  with  each  source -receiver  pair,  as  in  the  following  figure. 


Each  source  sends  messages  only  to  the  corresponding  receiver 

so  it  is  enough  to  indicate  to  the  deconcentrator  the  message  origins . 

We  will  consider  only  this  reduced  problem. 


2.  Basic  Idea. 

Assume  now  that  there  are  M independent  sources,  and  that 
messages  from  source  i arrive  at  the  concentrator  in  a Poisson  manner 
at  rate  X , so  that,  as  seen  by  the  concentrator,  the  probability  that 
the  next  message  comes  from  source  i is  1/M  . Does  this  imply  that 
we  need  at  least  an  average  of  log2  M bits  per  message  to  indicate  the 
origins  to  the  deconcentrator?  The  negative  answer  to  this  question 
justifies  the  existence  of  this  chapter. 


If  the  messages  were  sent  out  by  the  concentrator  in  the  order 
they  were  received,  log^  M would  be  a lowerbound  to  the  average  number 
of  bits  per  message.  However,  although  in  general  messages  from  a 
given  source  must  be  sent  in  the  order  they  were  received,  to  insure  the 
intelligibility  of  the  sequence  of  messages,  there  is  no  reason  for 
messages  from  different  sources  to  be  transmitted  in  this  fashion.  It 
is  precisely  the  possibility  of  reordering  the  messages  that  permits  a 
decrease  in  the  amount  of  information.  We  will  illustrate  by  two 
examples  how  easily  this  can  be  done. 

In  both  cases  we  assume  as  in  Chapter  4 that  each  message  contains 
a codeword  indicating  its  length,  that  the  sources  are  ergodic,  that  the 
mean  interemission  time  of  source  i is  E(a^)  , and  that  the  mean 
length  of  messages  from  source  i is  EQ)^]  . In  both  techniques  we 

queue  the  messages  in  a special  buffer  according  to  their  origins. 

In  technique  I we  transmit  a "0"  if  buffer  i is  empty;  if  not, 
we  transmit  a "1"  followed  by  a message.  We  go  then  to  buffer 
(i+l)  mod  M and  repeat  the  process. 

In  technique  II  we  still  transmit  a "0"  if  the  buffer  is  empty; 
if  it  is  not  empty  we  transmit  all  messages  present,  prefixing  then  with 
a "1"  . We  go  to  buffer  (i+l)  mod  M and  repeat  the  procedure. 

In  both  cases,  if  the  receiver  is  initially  synchronized  and  if 
there  is  no  transmission  error,  the  receiver  will  be  able  to  recognize 
the  origins  of  all  messages. 

By  a reasoning  similar  to  the  one  in  Section  1 of  Chapter  4,  we 
obtain  the  result  that  the  average  number  of  protocol  bits  (the  "0"s  and 


the  "T's)  per  message  is  equal  to 
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M 

1 - Z E b. /E  a. 
M 

Z 1/E  a. 
i.l  " 


CD 


for  all  techniques  resulting  in  a stable  system.  One  sees  that  in  heavy 

M 

traffic  (I  E b./E  a.  < 1)  this  quantity  will  be  very  small.  We 
i«l  ^ ^ 

recognice  that  amongst  the  protocol  bits,  some  indicate  that  the  line 
is 'Idle"  (all  buffers  are  empty),  while  others  effectively  indicate  the 
origin  of  the  messages,  but  the  receiver  is  incapable  of  differentiating 
between  these  two  kinds. 

"Hie  conceptual  difficulty  of  defining  a "protocol  bit"  that  we 
met  in  Chapter  4 reappears  even  more  strongly  here.  We  could  try  to 
reintroduce  the  concept  of  "low  priority  bit"  from  Chapter  4 but  this 
does  not  appear  to  lead  to  very  useful  results.  We  will  rather  use  two 
other  approaches:  in  Section  3 we  will  modify  the  model  and  neglect 
completely  the  idle  bits,  concentrating  on  the  study  of  how  the  reorder- 
ing of  the  messages  can  decrease  the  amount  of  information  necessary  to 
specify  their  origins.  In  Sections  4 to  7 we  will  analyze  some  strate- 
gies to  transmit  the  messages  and  their  origins  in  an  efficient  manner, 
the  goal  being  to  minimize  the  expected  message  delay. 
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3.  A Simplified  Model 
A.  Introduction 


To  avoid  the  difficulties  associated  with  the  presence  of  idle 
times  in  the  usual  queueing  model,  but  still  be  in  a position  to  study 
the  influence  of  the  reordering  of  the  messages  on  the  amount  of 
information  necessary  to  specify  their  origins,  we  study  the  following 
model  where  we  keep  the  number  of  messages  in  the  queue  constant. 


B.  Notation  and  Description  of  the  Model 

At  time  0 a buffer  contains  N-1  messages,  of  which  m^  came 
from  source  j,  j=l,2,...,M. 

At  time  i+j,  i=0,l,...  , one  and  only  one  new  message 
enters,  it  comes  from  source  j with  probability  p^  independently  of 
the  initial  content  of  the  buffer  and  of  the  previous  arrivals.  We 
denote  its  origin  by  . 

At  time  i only  one  message  is  removed  from  the 

buffer.  We  denote  its  origin  by  . 

We  denote  by  S.  the  state  of  the  buffer  at  time  i , i.e.  S. 

^ M ^ 

is  a M-tuple  Cm,,  m2,...,mj^),  2 m.  = N - 1 , where  m.  is  the  number 

j = l ^ ^ 


of  messages  from  source  j present  in  the  buffer  at  time  i . One  sees 

fN+M-2l 


that  the  number  of  possible  values  of  is 


N-1 


[Feller,  1968, 


p.  38],  which  we  denote  by  a . We  index  in  some  way  the  values  of  , 


and  denote  them  by  Sj^,  S2,...,s^  . The  probability  distribution  of 
is  known  a priori.  We  denote  it  by  the  row  matrix  , whose 
component  is  equal  to 


Similarly,  denotes  the  state  of  the  buffer  at  time  i + j . 

N messages  are  present  in  the  buffer  at  that  time,  so  that  sT  can 

take  a*  :=  M different  values  denoted  st,  st,.,.,s* 

^ N j 1'  2'  ’ a + 

Very  often  we  will  need  to  deal  with  sequences  of  inputs  and 

outputs.  Xp.  ..  denotes  the  sequence  (X.,  X.  X.  ,)  and  we 

define  Y..  ..  in  a similar  fashion. 

> J J 

It  will  prove  useful  to  define  a function  U (for  Unordering) 

whose  domain  is  the  set  of  sequences  Xr.  and  Y..  and  whose  values 

^ [i,j)  [i.l) 

are  M-tuples.  The  k component  of  U(X^^  is  the  number  of  X^ 

in  X r . . . that  are  equal  to  k . 

[i.j)  ^ 

We  can  use  U immediately  to  verify  the  relation 


S.  + UCX,.  -0  - U(Yr.  .0  = S. 

1 [i,j)''  [1,3)-'  j 


i 1 j 


If  a suitaole  probability  distribution  has  been  defined,  H(Y^^  j^) 

denotes  the  entropy  of  Y..  ..  . i.e. 

[i,j) 


H(Y,.  ..)  :=  - Z Pr  (Y..  .,=  /r-  -J  log^CPrCY..  y,-  • J) 
^ ^ [i,3)  '[i.j)  ’■  [i,j) 

^[i.j) 

To  avoid  the  introduction  of  more  symbols,  we  also  use  H in  the 

following  sense:  if  c is  a s-tuple,  c = (c.,c^,...,c  ) , with  non 

s ^ 

negative  components,  we  define  H(c)  :=  - Z c.  log,  c.  . The  meaning 

i=l  ^ ^ 

of  HC.)  will  always  be  clear  from  the  context. 


C.  Objective 


The  problem  we  wish  to  study  is  to  find  an  "optimal"  way  of 
making  the  Y^  's  known  to  an  observer  watching  the  output  of  the  buffer. 
This  involves  two  distinct  points:  first,  at  time  i + y the 
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transmitter  must  decide  what  message  to  send  out,  i.e.  the  value  of  . 
There  is  a constraint  on  : one  can  only  send  out  a message  that  is 
actually  in  the  buffer.  Mathematically  this  translates  into  the 
statement:  "all  components  of  must  be  non  negative,"  and  was 

implicitly  taken  into  account  when  we  determined  the  number  of  states. 
Second,  the  receiver  must  be  able  to  recognize  Y^  . To  that  effect  we 
allow  a binary  codeword  of  (variable)  length  n^  to  be  transmitted  in 
front  of  every  message,  and  we  require  that  the  knowledge  of  the  code- 
words transmitted  at  time  j + |-  , j=0,l,2, . . . ,i  , and  of  Y^^ 

uniquely  specifies  Y^  . 

Our  objective  will  be  to  minimize  the  "expected  number  of  protocol 

fl 

bits  per  message,"  h :=  lim  sup  E =•  E n.  over  all  possible  encoding 

T^  ['■  i=0 

strategies,  i.e.  the  choice  of  the  message  to  be  sent  next,  and  the 
choice  of  the  codewords  indicating  what  message  is  sent. 

We  will  give  some  exa.mples  in  Section  D and  a lower  bound  in 
Section  E.  Finally  we  show  in  Section  F how  dynamic  programming  can  be 
used  to  find  the  "optimal"  choice  of  the  message  to  be  sent  next. 

D.  Examples  of  Strategies 

The  end  of  the  previous  section  may  be  made  clearer  by  consider- 
ing the  following  strategies. 

STRATEGY  I 

We  transmit  the  messages  in  the  order  they  entered  the  buffer; 
this  is  the  only  choice  if  N=1  . The  probability  that  (Y^=k)  = Pj^  , 
i ^ N , thus  the  best  we  can  do  is  to  use  a Huffman  code  to  indicate  the 
message  origins,  and  the  average  number  of  protocol  bits  per  message,  h , 


will  be  bounded  by 
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H(p)  j<  h < H(p3  + 1 
where  p :=  Cp^,  P2....,  P^)  • 

STRATEGY  II 

We  do  the  following:  at  time  .75  we  send  a Huffman  codeword 

specifying  S*  ; at  times  .75,  1.75,  N -.25  we  transmit  in  some 

prespecified  order  (e.g.  all  messages  from  source  1,  followed  by  messages 

from  2,  etc.)  all  messages  present  in  the  buffer  at  time  .5  . Note  that 

no  codewords  are  needed  at  times  1.75,  2.75,  ...,  N - .25  . At  time 

N + .75  we  transmit  a codeword  specifying  S*  and  repeat  the  procedure. 

N 

The  probability  that  = (m^ , m^ , . . . , nv^)  , k ^ 1 , is  equal  to 


N! 


"*1  ""m  ^ 

Pi  »•••>  Pm'  ' ">4  1 0 « ^ = N , thus 

j=l 


"r  •••  "Si-  ^ J J 


★M 

»iE-  ) < h < »Cp  > 1 

N - N 


‘N, 


h < \r where  H(p  ) denotes  the  entropy  of  the  multi- 

nominal  probability  distribution. 

It  is  of  interest  to  examine  how  this  expression  behaves  as  N 
increases.  We  can  write 


H(P  ) 

N 


- p.  logj  p.  - i logj  Ml  . i 


M N 
Z Z 
j=l  m=l 


n}  m 

[mjPj 


lu,,  -N-m,  , 

p.(l-Pj)  log^  m! 


To  get  a lowerbound  we  use  the  log-convexity  of  the  gamma  function  to 
obtain 


«N 

> - r P log  p - i log  N!  * i z log  r(i  * Np  ) 


X 


The  use  of  Stirling’s  formula  [Feller,  1968,  p.  52],  tight  if  )ifr^  ^ 1 
log(r  (1+x) ) ^ log  + (x  j)  log  X - X log  e 


yields 


109 


3 = 1 


(2) 


To  obtain  an  upperbound,  we  use  Stirling's  formula  for  log  m!  together 
with  the  inequality 


m - Np 


log  m < log  Np^  + j^p  ■ 


i 


This  yields 


log  m!  ^ log^ 


AirNp , 

< log-/ -J-  + m 


log  e 


, log  e 

“t  " 2Nf-, 

e *^3  J 


lU  . 


This  does  not  hold  at  m=0  when  Np.  < .43  but  is  otherwise  satisfied. 

*N 

Using  this  in  the  formula  for  H(p  ) , and  using  Stirling's  approxima- 


tion for  log  NI  , we  obtain 


3-1  •' 

We  can  thus  conclude  that  for  this  strategy,  the  expected  number  of 

Ml  1 * 

protocol  bits  per  message  is  equal  to  log2  N + 0 |;j-  . 

STRATEGY  III 

Here  we  note  that  at  time  i j there  is  at  least  one  source 
such  that  I— J messages  from  it  are  stored  in  the  buffer.  We  send 
the  binary  representation  of  the  index  of  this  source,  then  the  I— 
messages.  The  average  number  of  protocol  bits  per  message  is  bounded  by: 

log^  M ■''O''’  1 

,N+M-1 . - ^ ^ |N+M-1 , 

^ M ■*  ^ M J 

Here  for  large  N , h is  approximately  equal  to  ^ log2  M 


which  is  better  than  in  Strategy  II.  However,  for  small  N , II  may  be 


better. 
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The  two  following  strategies  will  be  studied  . r the  case  M=2 
and  pj^  = P2  = .5  . A comparison  of  all  strategies  for  this  case  appears 
in  Table  5.1  . 

STRATEGY  IV 


Strategy  IV  is  essentially  polling:  one  transmits  as  many 
messages  from  source  1 as  possible,  until  none  remains  in  the  buffer. 

One  transmits  then  a run  of  messages  from  source  2,  then  1 again,  etc. 

The  end  of  a run  can  be  indicated  by  a flag  as  studied  in  Chapter  III. 

If  N=1  , each  run  has  a geometric  probability  distribution.  In 
general,  the  probability  distribution  of  a run  is  the  distribution  of  the 
sum  of  N independent  geometric  random  variables,  and  thus  a Pascal 
distribution; 


Pr  (run  = n) 


[if 

'n-l' 

[2] 

n-N, 

n=N,  N+1,  ... 


Its  mean  is  equal  to  2N  , so  we  can  bound  the  expected  number  of  proto- 
col bits  per  message  by 

Entropy  of  run  , , Entropy  of  run  + .56 


The  upper  bound  holds  if  the  assumptions  made  in  Section  4 of  Chapter  III 
are  satisfied. 

We  now  turn  our  attention  to  evaluating  the  entropy.  This  can  be 


done  numerically;  results  appear  in  Table  5.1.  To  obtain  asymptotic 

, . in 

results  we  note  tliat  the  entropy  is  equal  to  2N  - £ 

n»N 


n-1 

n-N 


log- 


n-1 

n-N 


fn-ll 

Writing  log  ^ » £ log  (n-N+i)  - log  (N-1)!  , we  see,  from  the 

i.l 


convexity  of  log  (n-N't-i)  , that  - log 


n-l)  i 
n-Nj 


IS  concave. 


By  Jensen's 


inequality  we  can  lowerbound  the  entropy  by  2N  - log-  j using 

log2  4ffN  4 1.  N J 

Stirling's  approximation,  by  = . Writing 


log  * log  Cn-1)!  - log  Cn-N) ! - log  (N-1)!^  using  the  convexity  of 

the  first  term,  using  Stirling's  approximation  together  with  the  formula 

logg  X £ x-1  for  the  second  term,  and  Stirling's  approximation  for  the 

2 

third,  one  can  upperbound  the  entropy  by  (log-  47re  N)/2  . Thus,  for 

log2  N |- 

Strategy  IV,  h behaves  like  — — + 0 . This  is  about  twice 

as  good  as  Strategy  II. 


STRATEGY  V 


As  mentioned  earlier,  we  study  this  strategy  only  for  M^2  with 
pj^  * P2  = .5  . Suppose  that  at  time  i + .5  we  know  that  only  messages 
from  source  j (j  = 1 or  2)  are  in  the  buffer.  We  can  then  send  N of 
them  without  any  codeword,  and  the  distribution  of  S*^^  will  be 
binomial.  We  then  alternate  between  messages  from  1 and  2,  until  this 
becomes  impossible  because  the  buffer  contains  only  one  kind  of  message. 
We  then  signal  the  end  of  the  run,  e.g.  by  a flag. 

The  expected  number  of  protocol  bits  per  message  is  thus  bounded 


Entropy  of  run  u < Entropy  of  run  + .56 
N + E(run)  — N + E (run) 

The  upperbound  holds  if  the  assumptions  made  in  Section  4 of  Chapter  III 
are  satisfied. 

It  is  of  primary  importance  to  study  the  statistics  of  the  run. 
Assume  that  we  try  to  send  a message  from  source  1 at  odd  times,  and  a 
message  from  source  2 at  even  times.  sT  performs  a non- stationary 


random  walk;  with  probability  .5  , » sT  whereas  with  probability 

• 5 , = sT  >■  C-li  1)  if  i is  odd,  and  * ^i 

i is  even.  A run  stops  if  sT  = (0,  N)  with  i odd,  or  (N,  0)  with 
i even.  However,  we  note  that  as  far  as  the  statistics  of  the 
remaining  time  in  the  run  is  concerned,  being  in  state  sT  » (k,  N-k) 
at  time  i is  equivalent  to  being  in  state  k)  at  time 

i+1  . We  can  thus  describe  the  process  by  the  (N+1,  N+1)  transition 
matrix 


corresponding  to  the  stationary  process: 
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It  is  entirely  feasible  to  compute  the  distribution  of  the  time 
until  trapping  in  state  (0,N)  if  the  initial  probability  distribution 
of  the  state  is  binomial,  by  classical  Markov  chain  methods  e.g.  [Howard, 
19' 1,  vol.  I].  Results  appear  in  Table  S.l.  Fortunately,  the  mean  time 
until  trapping  has  a simple  form.  Denoting  by  gCm^^,  m2)  the  mean  time 
until  trapping  if  the  initial  state  is  (mj^,m2)  one  finds  the  relations 


gCO,  m2)  * 0 

gCnij,  m2)  » 1 ♦ j Cg(ni2  + 1,  -1)  ♦ g(ni2.  "i^))  > 0 

The  solution  to  this  system  of  equations  is 
gCnij,  m2)  » 2m^  (2m2  + 1) 

Averaging  on  the  binomial  distribution  of  the  initial  state,  one  finds 
that  the  expected  run  is  equal  to  . It  is  now  easy  to  upperbound 
the  entropy  of  the  nin:  by  [Gallager,  1968,  p.  507]  it  is  upperbounded 

2 f 1 

by  (N  +1)  H j where  H is  the  binary  entropy,  i.e. 


M(x)  :»  H((x,l-x))  . This  bound  is  extremely  close  to  the  actual  value 
(the  relative  difference  is  less  than  1%),  indicating  that  the 
probability  distribution  of  the  run  is  nearly  geometric.  From  the 
results  of  Section  4 of  Chapter  III,  fixed-length  flags  will  be  almost 
optimal . 

Because  * ^ . h is  upperbounded  by 


h < 


log2  CeCN*"  > 1))  + .56 


♦ N 


The  presence  of  in  the  denominator  makes  this  scheme  markedly 

superior  to  all  others.  Note  that  it  is  the  combination  of  two  features 


that  makes  it  efficient: 
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— the  fact  that  it  does  not  attempt  to  send  a message  for  N time 
vmits  after  detecting  that  no  such  message  is  present; 

— the  fact  that  it  alternates  between  sources. 

Strategy  IV  (polling)  has  the  first  feature,  but  not  the  second;  we  have 
seen  that  the  expected  run  is  equal  to  2N  . If  one  uses  pure  alternat- 
ing, the  expected  run  will  be  equal  to  1 + j g^  + j g^  »=  2N  , instead 

of  when  both  featiires  are  present. 

jl  2 

There  are  strategies  for  which  h behaves  like  (k  log(N))/N 
+ 0 (ij)  even  when  M > 2 . We  describe  now  such  a strategy  for  the 

symmetric  case  (p^  * ^ , i»l,2, . . . ,M)  . It  is  a generalization  of 
Strategy  V. 

One  removes  one  message  from  each  source  in  cycles  (say  1,2,3,... 

M,l,2,.. ..)  until  this  becomes  impossible.  One  transmits  then  M-1 

codewords  indicating  the  number  of  messages  from  each  origin  remaining 
in  the  buffer,  and  those  N messages.  This  being  done  the  distribution 
of  the  buffer  state  is  multinomial  emd  we  start  the  procedure  again, 
removing  one  message  from  each  source  in  cycles.  We  call  the  number  of 
messages  transmitted  during  the  cyclic  part  of  this  strategy  a run  . 

If  one  uses  a flag  strategy  as  described  in  Section  4 of  Chapter 
III  to  indicate  the  end  of  a run,  h will  be  upperbounded  by 

log.  (e(E(run)  +1))  + .57  + (M-1)  (log.  Nl 

h < 


E(run)  + N 
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If  one  can  show  that  E(run)  is  proportional  to  the  desired 

result  will  be  obtained.  E(run)  can  be  computed  as  in  Strategy  V.  If 
. . . ,mj_j)  denotes  the  expected  run  length  if  the  initial  state  is 

. . . ,mj^)  , one  has  the  relations 


g (0,  m^/  ...»  nVj^)  * 0 


g(m^,m2»  ...»  n^j)  * ^ . . . ,mj^-l) 


+ gCm^,  m^+l, . . . ,mj^-l)  ...  + g(m2,  m2,...m^)) 

> 0 

This  cem  be  solved  numerically.  For  M“3  we  obtain  the  expression 


g (m^  f ^2  ^ ^3  ^ 


2m^  (3m2  + 1)  Om^  + 2) 

3 (m^  ”2  ”3^  ^ ' 


E (run)  is  equal  to  the 


average  of  g(.)  over  the  multinomial  distribution  of  the  initial  state. 

2 

If  M*3  we  obtain  E(run)  = N{N  + 1)/(3N  + 1)  , which  is  approximately 
2 

equal  to  N /3  for  large  N , as  desired. 

We  2u:e  unable  to  solve  this-  system  of  equation  for  all  N , but 
can  lowerbound  E(run)  by  the  following  method. 

Let  (m^,  m^,  ...»  m^)  denote  the  state  of  the  buffer  at  time 

j + .5  . Assume  that  at  time  .5  the  state  distribution  is  multinomial/ 
and  start  removing  the  messages  in  cycles.  In  order  to  obtain  the 


bound  we  remove  the  constraint  that  the  's  must  be  non  negative. 
Thus  the  buffer  state  performs  a non-stationary  random  walk  and 


Pr (run  £ j ) = Pr 


( min 

03^ 


^“(k+Dmod 


£ Z Pr  (min  (mf^'^~^)  < 0) 

i=l  ke  fN  :0<i+kM-l<j  ^ 


j=0,l. 


We  recall  a version  of  Kolmogorov's  inequality  [Karlin,  1975, 
p.  280]  : If  ^2'  ^ martingale  and  have  a mean  Ea  > 0 


Var(a  ) 


then  Pr  (min  (a,,  a_,  ...,  a ) < 0)  < 
12  n — — 


. Here  for  each  i 


(Ea)‘ 


the  m^  ^ 's  , k=0,l,...,  form  a martingale  and  have  mean 

^ M 

emd  variance  (N+i+kM-1)  ^(1  - . 

M M 

Thus 


Pr  (run  ^ j ) ^ M 


(N*j)  i (1  - i) 


(S) 

M 


j-0,1,, 


and 


Pr  (run  > x)  > max  (0, 


N - (N-i-x)  M(M-l) 


X > 0 
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00 

E(run)  = /q  Pr(run  > x)  dx 


^ _1  (N  - M(M-l))^ 
- 2 M(M-l) 


N > M(M-l) 


This  shows  that  E(run)  increases  at  least  proportionally  to 


for 


large  N , as  desired. 
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Strategy 

II  III  IV  V 

N = 1 1 1 1 1 


2 

.75 

1 

.678 

.599 

3 

.604 

.5 

.519 

.390 

4 

.508 

.5 

.423 

.274 

5 

.440 

.333 

.358 

.203 

6 

.389 

. 333 

.312 

.157 

7 

.350 

.250 

.276 

.126 

8 

.318 

.250 

.248 

.103 

9 

.292 

.200 

.226 

.086 

10 

.271 

.200 

.208 

.073 

11 

.252 

.167 

.192 

.063 

12 

.237 

.167 

.179 

.055 

13 

.223 

.143 

.167 

.048 

14 

.211 

.143 

.158 

.043 

15 

.200 

.125 

.149 

.038 

16 

. 190 

.125 

.141 

.035 

Table 

5.1 

1 

as  a function  of  N 


M=2  Pi  = P2  = 


.5 


E.  A Lower  Bound  on  h 
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We  have  shown  in  Section  D that  simple  strategies  (i.e.  II)  can 

make  h decrease  like  y ; more  complicated  strategies  (i.e.  V) 

log2  N 

yield  a decrease  proportional  to  s — . We  will  show  here  that  h 

cannot  decrease  faster  than  ( (M-1)/ (M+N-1) ) ^ . We  will  use  in  the 
sequel  many  standard  relation  between  information  theoretic  expressions; 
they  can  be  found  in  [Gallager,  19G8] . 

Assume  that  we  have  decided  on  a feasible  strategy.  We  have  that 
for  all  T . 


T ^ i^O  “^[O.T)^ 


h ^ lim  sup 

(in  fact  we  have  equality,  but  this  requires  a little  proof) 

"it) 

t=l,2,3,...  (3) 

We  now  lowerbound  ^ ; 

We  have  , jit , (i*l]  t)  ' ®it)  I- F ^ ^^[it , (i*i)  t) , (i.l)  t] 

l^[0,it)'  ^it^ 

where  I(A;B)  :=  H(A)  - H(A|B) 


H(B)  - HCB|A) 
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l’'[0.U)-  V 

by  the  Data  Processing  Theorem 
[Gallager,  1968,  p.  80]. 


by  independence  of  the  's. 


Repeating  relation  (1) 


‘^'^^[it.(i^l)t)^  = ^Ci^Dt  ^ ‘^*^^[it.(i+l)t)^  ■ ^it 

and  remembering  that  S,.  ..  can  take  a different  values,  we  see 

+ i j t 

that  for  every  U(Yj. ^j)  and  Su  ■ “<’'[it.(ltl)t)'  can  take  at 


most  a different  values. 


Thus  “^^•^^[it,Ci+l)t)^  l^^^[it,Ci+l)t)^’  ^[0,it)'  ^it^  1 ^082  ^ 

Writing  (i+l)t)^^  “ ^ Section  D,  and  replacing  in 


(3)  one  obtains 

h ^ max  i (Hfp  - log-  a)  (4) 

t=1.2,.. 

This  can  easily  be  computed. 

We  are  interested  in  an  asymptotic  relation  for  large  N . Using 


•tn  . M-1 


1 a 


HCP  ) 1—  lo^2irt)  + j log^  p^ 


, , e a 

* 27  |M- 


2 M-1 


n P, 


in  (4) 
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one  obtains  (neglecting  the  integer  constraint  on  t) 


30 


1 

(log2  e) 

f M ^ 

LllLiJ 

e 

.M*2  , o=N  , 

2ir  (log-  e] 

K > ^ 

i Pi  P2 

e N 


,854 


if  Pi  - P2  = 2 


One  can  show  that 


,N+M-2, 

’ ‘ M-l  ) i 


SO 


M+N-2 

r M-1  J 


M-1 


h > 


M 

M-1 

2ir(log2  e) 

1 

11  = 1 JJ 

M-1  I 


F.  "Optimal"  Strategy 

As  explained  in  Section  C,  a strategy  consists  of  two  parts: 

— a rule  to  determine  the  value  of  Y.  ; 

1 

— a code  to  indicate  the  value  of  Y.  . 

1 

The  first  part  is  the  most  interesting.  We  will  gain  some  insight 
into  it  by  assuming  that  non  integer  codeword  lengths  can  be  used  subject 
only  to  the  Kraft  inequality  [Gallager,  1968,  p.  47];  in  that  case  it  is 
very  easy  to  solve  the  second  part. 

Let's  assume  that  one  has  decided  how  to  select  that  Y^^  's;  then 
for  all  encoding  strategies 


0 

I 
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‘ Pi 

'''tO,i)">'(0.l)>  ‘°®2  “’’■‘V>'il''tO,ir>'[0.i)”. 

This  lowerbound  can  be  achieved  by  using  at  time  i a codeword  of  length 

-logj  .,)) 

if  Y.=y.  and  V(o.i)=y[o,i) 

This  codeword  provides  just  enough  information  to  enable  the  receiver  to 
recognize  . A consequence  of  this  is  that  the  conditional  probabi- 


lities 


PrCS^=Sj|Y^Q  i)"^[0  i)  codewords  transmitted  between  0 5 i] 


» Pr(S,=s.|Y^Q^.3=y^Q^.^j 

Note  that  this  is  not  true  for  all  encoding  strategies:  in  Strategy  II, 
the  codeword  transmitted  at  time  .75  specifies  not  only  Y^,  but  also 
. Thus  in  general  Pr (Sj^=Sj  |Y^=k  , codeword  transmitted  at  .75) 
PrCS^=s.lY^=k)  . 

Now  that  we  have  "solved"  the  second  part  of  the  problem,  we  can 
turn  our  attention  to  the  first  part:  how  should  we  choose  the  Y^  's 
so  as  to  minimize 

T-1 

li._,ap  I H(Yj„  . U.^up  - i t E 
^ T 1 0 y^Q^jj 

Pr(Yi»yi|  Y[o,i)“^[0,i)^^°®l*'^’^*-^i"^il  ^[i,0)“^[i,0)^ 


(4) 
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It  turns  out  that  this  can  be  done  by  dynamic  programming. 

Unfortunately  we  need  first  to  give  some  more  definitions: 

s s 

E denotes  the  unit  simplex  of  ]R 

u denotes  a column  matrix  of  suitable  dimension  (depending  on  the 
context)  with  all  components  equal  to  1 . 
e.  denotes  a row  matrix  of  suitable  dimension  (depending  on  the 
context)  with  all  components  equal  to  0 , except  the  , 

which  is  equal  to  1 . 

th 


n.  (y 


[O.i)^ 

Pr(S.=s 


is  a M-tuple  whose 

jl^[0,i)=y[0,i)^' 


j component  is  equal  to 


Similarly, 


• th 


^i^[0  i)^  is  a a -tuple  whose  j component  is  equal  to 


By  independence  of  the  , one  can  write; 


(5) 


where  P is  a (a,a  ) stochastic  matrix  whose  element  P. . 

ij 

Sj  * s.  + e.  , and  0 if  there  is  no  such  k. 


Pv  if 


3 1 

EXAMPLE; 


M = 2 
(J  3 2 


N » 2 

♦ 


if  s^  « (1,0) 

sj  « (2,0) 


S2  - (0,1) 
s*  » (1.1) 


S3  = (0.2) 


then 


fPl  P2 

[0  Pi 


0 

P2 


A policy 


3«1,2,...t  (t  will  be  defined  later),  is  charac- 
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terized  by  a (a  ,M)  policy  matrix  a“  with  the  following  properties: 

1)  A? 0 or  1 
1] 


2)  Z AT.  = 1 

3)  A?^  = 1 only  if  the  state  sT  contains  a message  from  source  j . 

The  significance  of  this  is  that  if  at  time  k+.5  the  state  is  s^  , one 

will  choose  Y,  :=  m such  that  A.  = 1 . Properties  1)  and  2) 

K J ^ 

guarantee  that  a unique  such  m exists,  and  3)  guarantees  that  only 
messages  that  are  actually  in  the  buffer  may  be  sent. 

Matrix  A*^  has  the  following  additional  properties,  which  are 
easy  to  verify 

1)  A*^  is  stochastic 

2)  If  policy  a is  used  at  time  i , the  conditional  probability 

that  = k given  i)  ~ ^[0  i)  equal  to  the  k^*^ 

component  of  Cby  (5))  of  n^CXj-g  P . 


EXAMPLE: 


M = 2 N = 2 as  before. 


There  are  only  two  policies,  1 and  2,  with 


1 0 
1 0 
0 1 


1 0 
0 1 
0 1 


In  both  cases,  one  transmits  a 1 in  state  (2,0)  and  a 2 in 
state  (0,2)  (there  is  no  other  choice);  policy  1 transmits  a "1"  in 
state  (1,1)  , whereas  policy  2 transmits  a "2." 

If  ^i^[0  i)^  “ policy  1 is  used, 

CPrCYi.l|Yj„  1'^ » 


- (Pi  P23  [Pi  P2 


Pi  P2  0 10 

10 

0 Pi  P2J  [0  1, 


(Pi  " P2  Pi’  P2  P2^ 


Note  that  the  number  , t , of  policies  can  be  quite  large:  if 
messages  from  k origins  are  present  in  state  s*  , the  row  of 

a policy  matrix  can  take  k distinct  values.  The  number  of  states  with 

messages  from  k origins  present  is  in  turn  equal  to  , 

fa'l  ^ J ^ J 

(with  [^bj 0 if  a < b ),  where  the..first  factor  is  the  number  of 

distinct  choices  of  k origins,  and  the  second  factor  represents  the 

number  of  ways  of  distributing  N messages  between  k origins,  in 

such  a way  that  each  origin  receives  at  least  one  message.  This  last 

number  is  equal  to  the  number  of  ways  cf  distributing  N-k  messages 


between  k origins.  Thus  there  are  n k 

k=l 

EXAMPLE: 


n vW[k-l 


distinct  policies. 


We  have  seen  that  if  M=N=2  , there  are  2 policies.  In  the 
seemingly  innocuous  case  M=4  , N=8  , there  are  about  6.22  10^^  policies. 


Associated  to  policy  a we  define  M (a  ,a)  transition 


matrices  B“’  ^ k*l,2, . . . ,M,  by 


B. ! a / 1 if  and  only  if 

< s,  « s,  - e. 


^ ’j  ■ ’1 


0 


otherwise 
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These  matrices  have  the  following  properties;  they  are  proved  by  direct 


examination. 

1)  u = column  of  a'*  ; 


2)  If  policy  a is  used  at  time  i , Pr(Y^=k|Y^Q  if^[0  i)^ 

3)  If  policy  a is  used  at  time  i , 

a,k 


"i^^ro  i)^  ® 


C6) 


Cthis  is  Bayes'  rule). 

Property  2)  justifies  the  appellation  of  "transition  matrix."  Using  C5), 
C6)  can  be  written  as 

njO-to  i,)  P b“''‘ 


EXAMPLE:  M » 2 N » 2 as  before. 

Associated  to  policy  1 (defined  earlier),  we  have  the  matrices 


1 

0^ 

0 

o' 

s 

0 

1 

0 

0 

0 

OJ 

0 

1 

As  an  illustration,  B^’^  = 1 because  state  (0,1)  can  be  obtained  from 
(1,1)  by  removing  (1,0),  and  because  if  policy  1 is  used,  a "1"  is  trans- 
mitted if  the  state  is  (1,1). 

Say  policy  1 is  used  at  time  i , 

■ <“!•  “i’ 

"i'^[0,l))  ■ “v  “3>  • '“iPl’  »lP2  * "zPr  “2i’2> 


Then  if  - 1 
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We  have  that  H(Y^q  T+1)^  ’ ^T-t-l  ‘ called  the  cost  to  go  at  time 

T-i+1  . Using  Bellman's  principle  of  optimality  [Bellman,  1957]  we  see 
that  this  expression  can  be  minimized  by  going  backward  in  time:  at 
time  T-i  , for  every  sequence  y^^  , we  should  find  a strategy 

such  that  the  resulting  values  of  Pr(Y^_^»k | T-i)“^[0  T i)^  ’ 
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k=l,2,...M  , minimize  . 

In  a first  step  we  will  minimize  T+1)^  over  all  strategies 

consisting  of  using  at  time  i a policy  otCXi-rt  •i)  • ^e  will  show 
later  that  nothing  is  gained  by  using  more  general  strategies. 


At  time  T the  receiver  has  computed  tj^  ’ trans- 
mitter, which  can  also  compute  j^)  » decides  to  use  policy  a , 


one  checks  that 


One  sees  that  there  is  a policy  a^Cn^Cy^P  j)))*  depending  on  y^^ 
through  n.j.Cyj-Q  , that  minimizes  H(n.j.(yj.Q  P a“)  over  all 
policies.  We  denote  the  minimum  by  Vj^(n^yj.g  . Thus  (called 

the  minimal  cost  to  go  at  time  T)  is  defined  by 

V^(n)  :=  min  H(n  P A*^) 
a 

=>  H(n  P (8) 

It  is  aesthetically  pleasant  to  define  :=  0 (9) 

EXA.MPLE:  N =2  M = 2 as  before 

If  policy  1 is  used, 

fPl  P2  ° 

■ H p,,  ^ ,1  oj  . : 

I ^ . 

whereas  if  policy  2 is  used 

One  sees  that  policy  2 minimizes  the  expected  immediate  cost  if 


Pi  102 
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As  we  have  seen  earlier,  the  number  of  policies  can  be  enormous. 
We  show  now  that  at  most  M!  policies  need  to  be  considered  when  one 
minimizes  . 


THEOREM  I 

Let  Uq  be  a policy  minimizing  the  expected  immediate  cost 
HCn  P A^')  for  a given  n in  . Denote  the  i^^  component  of  HP  by 


'’i 


Let  :=  n P A 

For  the  given  H , for  all  i such  that  > 0 define  the 

relation  > on  |i,2,...,m|  by; 
a 

if  A^?  = 1 , then  j > k for  all  k?^j  such  that  s^ 
contains  a message  from  k . 

Then  > is  a partial  ordering  of  {l,2,...,M}  . 

Proof: 

We  must  prove  that  if  ^ ^2 


Jn-1  " ^n 

then  it  is  not  true  that  . Assume  to  the  contrary  that  ^ 

Without  loss  of  generality,  assume  that  state  s!^  contains 

a 

messages  from  and  . . i=l,2,...n  , and  that  A.?  = 1 . 

Because  is  optimal 


-"jlog^^j  - "jlog/j  1-CTj  -o^log^T.  -0^  - (T.  ^(^)10g2(T.  ^0^)  (10. a) 

112  2 1 1 
otherwise  H(nPA^) 

3 

A ? - 1 . 


3 2 

would  be  reduced  by  making  Aj^?  = 0 and 


- 1 . 


Relation  (10. a)  can  be  rewritten  as 
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(T.  -oJlog.,CT  -0  ) - T log  T 
Jl  i “ Jl  ^ h h 

The  function  x log.,x  - Cx+P)  log2 

so  (10. b)  implies 


< T log  T.  - (t.  ♦pjlog  (t  +0  ) 

J.j  ^ J 0 J ^ ^ J y ^ 

(10. b) 

decreases  with  increasing  x for  p>0. 


Similarly 


T - P.  1 t i=l,2,...,n 

^i  ^i  mod  n)t-l 

Adding  these  inequalities  one  obtains 


n 

Z p . > 0 
i=l  ^ - 


which  is  a contradiction. 

Q.c.D. 

Because  a partially  ordered  finite  set  can  be  totally  ordered,, 
we  have  the  following  theorem; 


THEOREM  II 

There  is  a policy  a minimizing  H(II  P A°‘)  which  has  the  form 

— define  an  ordering  > on  {1,2,...,M} 

ot  + . 

— A^j  3 1 if  j is  represented  in  s^  and  if  j > k for  all 

k^j  represented  in  sT  . 

There  are  at  most  M!  such  policies. 

Q.E.D. 

An  algorithm  that  comes  naturally  to  mind,  but  which  does  not 
quite  work,  to  define  the  ordering  > is  the  following: 

--  for  j=l,2,...,M  compute  from  n the  probabilities  p^  , 

kal,2,...,M  , that  at  time  i*.S  the  buffer  contains  at  least  one 
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msssage  from  source  k , but  none  from  sources  k^,k2,...,kj 
Let  k.  :=  min  {k  : p;’  > m=l,2,...,M}  . 


--  Define  > on  by  k > k-  > . . . > k,,  . 

The  idea  behind  this  algorithm  is  to  send  a message  from  the  origin  that 
is  the  most  likely  to  be  represented  in  the  buffer.  If  this  is  impossi- 
ble [because  no  such  message  is  in  the  buffer),  we  try  the  next  most 
likely  origin  and  so  on. 

Here  is  a counterexample  showing  that  the  resulting  policy  is  not 
necessarily  optimal. 


EXAMPLE:  N = 2 M = 3 

Pi  * -2  ?2  " P3  " 

s^  = (1,0,0)  S2  = (0,1,0)  Sj  = (0,0,1) 

n = (.475,  .05,  .475) 

s*  = (2.0,0)  s^  = Cl, 1.0)  S3  = Cl. 0,1) 

s*  » (0,2,0)  S3  = (0,1.1)  S3  = CO. 0.2) 

One  finds 

n*  = (.095,  .295,  .19,  .03,  .295,  .095) 

p}  = .58  P2  = .62  P3  = .58  k^  = 2 

p^  a .285  P2  = 0 P3  “ ~ ^ 

» 0 P2  =■  0 P3  = .095  ^3  =«  3 

so  2 ? 1 i 3 


The  resulting  H(.)  = -.62  log2  (.62)  - .285  log^  (-285) 


.095  log2(.095) 


1.26 


*1 


However,  the  ordering  1 > 3 > 2 
results  in  the  cost 
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H(.)  =-.58  log^  C.58)  - .39  log^  (.39)  - .03  log2  (.03)  = 1.13 


At  time  T-1  , or  more  generally  at  time  T-i  , i=l,2,...,T  , 
the  receiver  has  computed  il.j,  T i)^  ‘ transmitter  must  find 

a policy  ^(y^Q  T-i)^  minimizing  (from  (7)) 

^i*^y[0,T-i)^  ^^T-i"''l^[0,T-i)"^[0,T-i)^- 


'^i*-”T-i+l^*-^[0,T-i)’’^^^^ 


IVe  have  seen  earlier  that  if  policy  a is  used 


<'i^y[0,T-i)^  - «^Vi(y[0.T-i)) 


*^^T-i°'^'^[0,T-i)"^[0,T-i)^  ^ '^T-i  [0,T-i)  ^ ® 


'^T-i+l*^‘^^[0,T-i)^^ 


Vi  ^^Fo/r-i)^  P 

"t-i  (j'lO.T.l))  ' “ 


Thus  policy  ^(/[g.T-i)^ 


must  minimize 


'’T-i^y[0.T-i)^  ^ " k=l  "T-i^^[0.T-i) 


) P b“’’"  u 


Vi^^ro,T-i)^  ^ ^ 


a,k  ] 


' [Vi(y[o.T-i))  p «“’'uj 

Clearly  there  is  an  optimal  policy,  C^j.i C” which  depends 
on  y[o,T-i)  through  ^x.iCy^o.T-i)^  • "'®  '^i  + l^’^^  ’ 


the  minimal  cost  to  go  at  time  T-i  , by 


THEOREM  III 

V.  (IT)  is  a continuous  function  of  11  . 

Proof: 

By  continuity  of  HC.)  and  induction  on  i . 

Q.E.D. 

THEOREM  IV 


Let  A be  a (s.t)  stochastic  matrix. 


Then:  n u H is  a concave  function  of  II  for  il  in  the  set  of 


s-tuples  with  non-negative  components. 


Proof : 


Let  and  11^  be  two  such  s-tuples; 

let  (qj q^)  A 

(“l! <1^  1-  "2  * 


Then:  for  /»c[0,l] 


HA  n.A 

xn,  u H ^ 4.  (1-x)  n.  u H - 

1 n^u  ^ ' 2 n^u 


,(xn  * (i-x)n,)A 
- (xn^  + (i-x)n2)  u H (xn^  + (i-x)ii2)u 


ft  U 

11  2 

£ q.  Xq:  + (1-X)  q. 

= X E qj  log,  ly  ..-..I— ^ 

^ q-  E q + (1-  ) q^ 

^ i=i  ^ H 


^ E q?  [xq^  + (1-X)  q^] 

. (1-X)  E q^  log  li-i— j L 

j = l ^ 2 r .1  . 2^ 


q"  E Xq^  * (1-X)  qT 
•*  li=l 


< 0 because  log  x < x-1 
— 6 ~ 


Q.E.D. 


If  s=t  and  A is  the  unit  matrix,  this  gives  the  well  known 

result  that  H(n)  is  a concave  function  of  n for  II  in  E . 

COROLLARY  IV. 1:  Let  A be  a (s,t)  stochastic  matrix  and  C be  a 
(r,s)  nonnegative  matrix. 

Then:  HCu  is  a concave  function  of  H for  n in  the  set  of 

(nCu_ 

s-tuples  with  non  negative  components. 


Proof : 


The  components  of  IIC  are  non  negative  and  the  composition  of 


a concave  function  and  a linear  function  is  concave. 


Q.E.D. 


COROLLARY  IV. 2:  For  all  (s,a)  non  negative  matrix  C , for  all  i > 0, 


HCu  V.  -br-i  is  a concave  function  of  IT  , for  n in  the  set  of  s- 

i(nCuj 

tuples  with  non  negative  components. 

Proof : 


By  induction  on  i : 


0 thus  Vq  is  concave 
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V.  . (•)  = min 
1+1  ^ 

a 


so  nCu  V.  = min 

i+l^nCuJ  ^ 


ncu  Hi- 


M 

r 

•PB^’’^  u V. 

1 

• pb' 

k=l 

•PB 

M 

z ncPB^’  u V. 

ncPB' 


t.k  P 


ct  k 
j_ncPB  ’ uj, 


The  terms  in  the  right  hand  side  member  of  this  equation  all  are 
concave  by  the  previous  corollary  and  induction  on  i . The  minimum 
of  a set  of  concave  functions  is  concave. 

Q.E.D. 

We  are  now  in  a position  to  prove  that  nothing  is  gained  by 
using  more  general  strategies  than  what  we  have  considered  until  now, 
i.e.  strategies  where  at  time  T-i  one  uses  a policy  determined  by 

^[O.i)  • 


THEORtM  V 


Denote  by  T i)^  cost  to  go  at  time  T-i  if  one 

uses  a given  causal  strategy  (i.e.  P^CV*^  ^ . 

^[i.T+l)"’'[i.T+n^  = ^''^^i"’'l'^[0,i)=^[0,i)’''^[0,i)='‘[0,i)^- 


Dq  0 . 

Then:  i=0,l,2, . . . ,T+1 

Proof : 

By  induction  on  i . 

--  n *>  V 

‘■0-0 


Suppose  i , then  from  (7] 
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- ^i*^^[0,T-i)^  * ^^'^^T-i"'^'^[0,T-ir^[0,T-i)^- 

Let  the  matrix  A*  be  defined  by 

A*.  :=  PTCY^.i=j  |Sj.i=Sjj,YjQ^^_.^=yjQ^^_.^) 

An  instant  of  reflexion  will  convince  the  reader  that  A*  can  be 
written  as  a convex  combination  of  policy  matrices; 


A*=LcA  , c>0  Zc=l 

, a ’ a — , a 

a=l  a=l 

+ * k 

Defining  similarly  the  M (a  ,a)  matrices  B ’ k=l,2,...,M 

®nj  ' ^T-i"'^^^T-i"^n'^[0,T-i)^^[0,T-i)^ 


one  has  that  B 


= Z c B^*’^ 
a=l 


AS  before  Pr(Y^_.  =k  | Y ^ =y  ^ ) = Vi(y[0,T-l) 


* k 

)B  ’ u 


and  by  causality 


fO  T-1)^  ^ 

Vi.i^^yro  T-ir*^^^  = --  ^ ’ ^ ^ 

1 LU,1  IJ  n (v  IP  B ’^u 


riius  from  (12) 


M I T 


k=l  ;a=l  L • •> 


<=a("T.l(>'tO,T-i))'’»“’'^' 


By  Theorem  IV  and  Corollary  IV. 2,  the  right  hand  side  is  a concave 
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function  of  (c^ ,C2 , . . . , and  thus  takes  its  minimal  value  at  a 
vertex  of  , say  e^  . 

Thus 


°i+l'^[0,T-i)^  - “‘^'^-i'^^fO.T-i)^’’  ^ k-1  • 

,8,k  ' 


V. 

1 


S-i^^[0.T-i)^^  ^ 


- '^i+l*'^T-i‘-^[0,T-i)^^ 


by  (11) 


Q.E.D. 


V^(n)  is  naturally  a nondecreasing  function  of  i ; the  next 
theorem  says  something  about  the  behavior  of  the  increase. 


THEOREM  VI  [Odoni,  1969] 


min  (V.  ,(n)  - V (H))  > min  (V  (E)  - V (E)) 

jj  i+i  1 jj  1 i-i 

max  (V.  (K)  - V (H))  Max  (V  (H)  - V (H)) 

jj  i+i  1 jj  1 i-i 


Proof : 


From  (11) 


a (H)  M 01  (n),k 

V.  , (n)  = H(n  p A ^ ) + I nPB  u v. 

1+1  , , 1 

k=l 


f a (n).k  1 
nPB 


a.  (n).k 
nPB  u 


nPB 


a.(n),  M 

V (H)  < H(n  P A ^ \-l 

[nPB 


a.  (n),k 


M a (n).k 

so  v^^^(n)  - v.(n)  ^ z nPB  u 

k=l 


' 

a (n),k 

a (n),k  ] 

v 

nPB 

- V 

nPB 

V . 

1 

a.  (n),k 
nPB  uj 

'^i-i 

a.  (n),k 
nPB  uj 

and 
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Min  CV.^,  (n)  - V CH))  i Min  (V  cn)  - v (n))  . 
n n 

The  other  statement  is  proved  by  replacing  oi^(n)  by  ^^(11)  . 

Q.E.D. 

Because  the  's  are  increasing,  it  is  inconvenient  to  work 
with  them  numerically.  We  note  that  a^(II)  will  still  minimize  the 
right  hand  side  member  of  CH)  if  is  translated  by  a constant. 

This  leads  to  the  definition  of  v^  and  v^  as  follows. 


VoCn) 

Vif“^ 


:=  0 

min 

a 


HcnpA®) 


+ 


M 

Z 

k=l 


npB^*’^u 


V. 

1 


nPB^’V 


iO.l  , . . . 


(12) 


V.  , (H)  :=  V.  , (IT)  - V.  , (e, ) 

1+1^  1+1'-  ^ 1+1'-  I-" 

One  checks  by  induction  that  v.  .CH)  = V.  , (H)  - V.  ,(e,)  , and  that 
' 1+1^  1+1  ' i+P  l"^  ’ 

v^(ej^)  = 0 for  all  i . 

v^(II)  can  be  interpreted  as  the  relative  cost  of  having  a state 
probability  vector  n at  time  T-i+1  . 

Theorem  VI  can  be  rewritten  as 


Min  Cv^(n)  - v^_j^(n))  <_min  - v^CH)) 

n n 

_emax  (v.^^(n)  - v^(n)) 
;<max  (V.  (n)  - V.  , (11)) 

n ^ 

We  turn  now  to  the  discussion  of  the  infinite  T case.  It  is 

natural  to  assume  that  there  exists  functions  a and  v , and  a 

constant  g , such  that 

lim  a.  = a 
. 1 * 

1-** 

lim  V.  =>  V 
1 ■» 


A 
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lim  v^CeJ  = g 

i-«o 


Then  one  would  expect  from  CH)  that  the  following  relation  holds; 


g + v^Cn)  = min 
a 


M 

HCnPA^)  + E nPB“’  u V 
k=l 


H(nPA 


a Cn) 


M a^cnhk 
) + E npB  L 

k=l 


nPB^'*^ 

) 

1 

UPB*^’ V 

J 

V 

nPB 

V 

00 

a.(n)A 
[nPB  uj 

(13) 


The  optimal  strategy  would  be  to  use  the  policy  (^>'[0  i)^^ 

times  i , and  one  expects  lim  ^ H(Y|.-  _.)  = g . 

X-«o  ^ LU>1) 


This  is  made  precise  in  the  following  theorem. 


THEOREM  VII 

If  there  exists  a bounded  real  valued  function  v^  , a function 
a*  and  constants  g^^  and  g^  such  that  for  n in  e'^ 


8^  V*  (H)  1 min 


H(nPA“)  + E nPB“’’^u  V, 
k=l 


fnPB°‘'''  1 


nPB“’'^u 


ci*(n)  M a*cn),k 
HCnPA  ) + E nPB  u V, 

k=l 


nPB 


JJ 

a*(n),k  ^ 


a*cn) ,k 

nPB  u 


(14) 


Then: 


1 82 


--  the  entropy  H^(Yjq  T)  corresponding  to  using  policy 

<^*(^j^(y^g  ^^))  at  all  times  i has  the  property  that 


*1  - T “a^Y[0.T)^  i T ”a^^[0.T)^  ^ 82 


and 
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(16) 


where  H^(Yfo.T))  results  from  a causal  strategy. 


Proof : 


Let  £1  :=  sup  v^(n)  - inf  v^,CII) 

n 

We  define  D fH)  :=  0 
o 


01*  (H)  M a*(n),k 

D.  . (n)  :=  H(nPA  ) + Z nPB  I 

^ k=l 


a*(n),k 

, nPB 

1 o.*(n);k 

nPB  c 


i=0.1,.. 


From  (7):  = 0^(11^)  . 

We  have  the  relation 

DnCn)  1 v.(n)  - inf  v*(n) 

n 

and  by  induction  on  i and  (13) 

(n)  1 i go  + v*(n)  - inf  v*(n)  i=l,2, . . . 

^ ^ n 

We  can  conclude  that 


Dt(IIo) 


-«2  " f 


thus  proving  that 

linkup  <82 

i.e.  the  right  hand  part  of  (15). 


We  also  have  the  relation 


v,(n)  - sup  Vjn)  < VqCH) 


and  by  induction  on  i and  (14) 


i + v.cn)  - sup  v^cn)  iv^(n)  i=i,2,... 
We  can  conclude  that 

i gi  - ‘^iv-cn^) 

Now,  if  (16)  is  not  true,  there  is  a strategy  such  that 

Q+1 

But  there  is  a T > such  that 

— e 

T ^*^[0,T)^  - T ^^^[0,T)^  ■"  ^ 

For  that  T , 

^ 1 . 1 


lul 


(17) 


> v^cHq)  + a + 1 
which  contradicts  (17). 

Thus  (16)  is  true,  and  the  left  hand  part  of  (IS)  follows, 

Q.E.D. 

This  theorem  asserts  that  if  one  can  find  functions  v*  , , 

and  constants  and  g^  , e.g.  by  using  algorithm  (12),  one  can  bound 
the  optimal  performance,  and  one  can  find  a strategy  performing  within 
^2  " ®1  ’ optimum.  Theorem  VI  guarantees  that  g^  - gj^  does 

not  increase  as  one  progresses  in  algorithm  (12).  Note  that  convergence 
can  be  hastened  in  (12)  by  damping  [Schweitzer,  1971],  i.e.  defining 
Vi^l(n)  X(v^^j^(n)  - Vj^^j(ej^))  + (1-X)  v^(n)  for  some  well  chosen 
X in  (0,1]  . 


COROLLARY  VII.  1:  If  there  is  a bounded  real  valued  function  v , a 
function  and  a real  number  g such  that  (13)  is  satisfied,  the 
strategy  consisting  of  using  policy  a^(n^(yjg  ^^))  at  all  times  i 
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is  optimal,  and  lirndHCY,.  .)  » g . 

T-ho  L'J>U 

Proof : 

Make  g,  = go  “ g in  Theorem  VII. 

^ ^ Q.E.D. 


Note  that  nothing  in  this  corollary  guarantees  the  existence  of 
an  optimal  strategy. 

Note  also  that  if  a policy  3(n^)  is  used  at  all  times  i , the 
's  themselves  form  a stationary  Markov  Process  in  the  simplex  of 
]R  , and  the  probability  distribution  of  can  be  computed.  Our 

problem  can  be  seen  as  a Markovian  decision  theory  problem  with  obser- 
vable state  Ci*e-  • These  problems  have  been  extensively  studied 

especially  in  the  finite  dimensional  case  (see  [Kushner,  1971]). 
Contrary  to  what  is  usually  done,  the  proof  of  Theorem  VII  carefull)' 
avoids  the  use  of  the  stationary  distribution  of  the  's,  which  is 
not  guaranteed  to  exist,  because  the  hypotheses  are  not  very  restrict- 
ive. 

EXAMPLE:  M = 2 N =>  2 as  before 

Let  n a ■ 

Equation  (12)  takes  the  form,  where  we  use  in  place  of 


v„((o^.P2)) 


g v^(p^)  » min 


^(p^+p^p^)  + (1-P^)P2  v_^(0)  +(pj^+p^P2)v^ 


PiP 


l*'l 


P1-P1P2 


f^CPiPi)  * P^Pj  V^(l) 


Pl*0l(P2-Pi)]' 

^Pi 

V ^ J J 


The  first  argument  in  mir(.,.)  corresponds  to  policy  1,  the  second 
to  policy  2.  ^ 


We  have  solved  numerically  this  example  for  different  values  of 

2 

by  discretizing  the  unit  simplex  of  R (51  points)  and  using  the 
algorithm  (12  ).  Results  appear  in  Table  5.2. 


Table  5.2 

corresponding  to  an  optimal  strategy 
M=2  N*2 


Pi  g 

.5  .60 

.6  .58 

.7  .51 

.8  .41 

.9  .25 

.95  .14 


In  all  cases,  an  optimal  strategy  turns  out  to  be: 
use  policy  2 when  i .5 

Note  that,  if  p^^  = .5  , this  is  exactly  what  Strategy  V of  Section  C 
does. 

For  Pj^  ^ .5  this  result  shows  that  the  strategy  of  always  mini- 
mizing the  expected  immediate  cost  is  not  optimal. 

It  would  be  pleasant  to  prove  analytically  that  the  strategy 
described  above  is  optimal.  In  the  case  Pi  “ P2  " ,this  would 

involve  finding  a bounded  function  v^  and  g verifying 

p P 

g * v„(0^)  (y^)  > y-  v„(l)  ^ y(2-0^) 

for  0^  . 5 , and  a similar  expression  for  I.  .5  . By  symmetry  one 
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expects  v^(x)  = v^(l-x)  , so  and  g must  satisfy 

g + v^Cp^)  = WC-j)  * -j  v^CO)  > jC2-o^3v^ 

I 

In  this  expressions,  all  the  arguments  of  are  between  0 and  .5. 

Once  this  function  is  found,  one  should  prove  that  it  satisfies  (13) . 

Before  closing  this  section,  we  make  a brief  historical  review. 
Our  problem  is  essentially  the  problem  of  controlling  a Partially 
Observable  Markov  Process.  We  solve  it  by  working  in  the  simplex  of 
R , where  the  's  form  a Markov  Process  if  a policy  a^(n^)  is 
used  at  times  i . The  problem  is  thus  "reduced"  to  a Markov  decision 

problem  with  observable  state.  The  idea  of  doing  this  has  become 

classical  starting  with  [Drake,  1962].  One  can  find  more  references  in 

Section  4 of  [Platcman,  1976].  This  last  work  is  an  attempt  to  control 

Partially  Observable  Markov  Processes  without  making  the  transformation 
to  the  n space,  and  is  also  an  e.xcellent  review  of  the  state  of  the 
art . 

We  should  point  out  that  the  Partially  Observable  Markov 
Processes  studied  in  the  literature  are  simpler  than  what  is  considered 
here,  because  their  immediate  cost  is  only  a function  of  the  state  of 
the  original  process,  and  the  policy.  Thus  the  expected  immediate  cost 
at  time  T-i  if  policy  a is  used  has  the  form 

^i^[0,T-i)^  " 'Vi*^^[0,T-i)^.  •. 
for  some  column  a-tuple  q'^  . 

This  compares  with  C^Cy^^  T-i)^“  T-i)^  ^ case. 

However  the  nice  properties  of  continuity  and  concavity  of  the  functions 


I-Oj' 

--"ij 
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V^(ll)  in  the  simpler  prc±)lan  are  oonserved  here.  [Platzman,  1976] 
gives  sufficient  oonditions  on  the  matrices  for  an  optimal 

solution  to  exist  in  the  simpler  case;  it  seans  that  these  oonditions 
vxxild  still  be  sufficient  here.  HcMever,  they  are  extremely  cumioersams 
to  verify. 

G.  Suggestions  for  Future  Vfork. 

Althou^  it  is  not  of  intnediate  practicad.  use,  it  would  be 

vopthwhile  to  prove  that  an  optimal  solution  exists,  that  it  verifies 

(13) , and  that  v is  continuous  and  concave. 

00 

It  would  be  especially  interesting  to  find  analytic  expressions 
for  V and  a , at  least  for  simple  cases . We  oonjecture  that 

' Ic»l,2,...,a  , i.e.  that  the  relative  values  of  perfect 
state  loxKirledge  are  the  same,  regardless  of  the  state. 

Che  should  try  to  prove  or  disprove  the  possibility  that  a^^dl) 
adways  belongs  to  the  special  class  of  policies  considered  in  Theorem  II. 

Finailly,  we  assumed  until  new  that  the  p^^  's  were  known.  One 
should  find  robust  strategies  (e.g.  a minimnw strategy)  that  oould  be  used 
when  the  source  statistics  are  imperfectly  known. 
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4 . Analysis  of  Practical  Strategies 
A.  Notation  and  Organization 

Throughout  Sections  4 to  8 we  will  consider  a model  where  source 

i , i=l,2,..  M , emits  messages  in  a Poisson  manner  with  rate 
M 

X.  :=  l/Ea^,X-j,  :=  Z X.  , where  every  message  contains  a codeword 
i=l 

indicating  its  length  and  where  the  lengths  of  the  messages  from  source 

i have  a probability  distribution  . We  assme  that  the  message 

lengths  and  interarrival  times  are  independent  random  variables.  We 

will  attempt  to  compute  the  expected  message  waiting  time  for  different 

strategies  indicating  the  message  origins. 

In  Section  B we  will  quickly  study  the  equivalent  of  strategy  I 

of  Section  3.B:  the  concentrator  transmits  the  messages  in  the  order 

es 

they  were  received,  and  prefix/ each  of  them  with  a codeword  indicating 
its.  origin. 

In  Section  5 we  analyse  some  variants  of  Strategy  II  of  Section 
3.B,’  periodically  the  concentrator  sends  a codeword  indicating  the 
state  of  its  buffer,  then  empties  it.  This  will  lead  to  a source  coding 
problem  of  independent  interest  that  will  be  treated  in  Section  6. 

Section  7 will  see  the  computation  of  the  average  message 
waiting  time  in  cyclic  strategies,  where  the  concentrator  serves  all 
messages  from  source  i present  in  the  buffer,  then  all  messages  from 
source  i+l  , and  so  on.  Finally  we  will  discuss  all  results  in 


Section  8. 
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B.  Analysis  of  the  First-In-First-Out  Strategy 


We  send  the  messages  in  the  order  they  were  received,  prefixing 
a message  from  source  i with  a codeword  of  length  n^^  . We  must 
also  specify  what  to  do  when  the  line  is  idle.  In  that  case  we  use 
the  same  policy  as  in  Section  2 of  Chapter  4 , i.e.  we  insert  a flag 
of  length  at  the  end  of  a busy  period,  then  flags  of  length  V2 

if  no  arrival  occurred  during  the  transmission  of  the  previous  flag. 
Note  that  the  flags  and  the  codewords  must  be  chosen  jointly,  so  that 
the  probability  of  an  insertion  in  a message  will  depend  on  the  origin 
of  the  message.  We  denote  by  p^  the  probability  that  the  flag  of 
length  causes  an  insertion  in  a message  from  source  i . 

We  will  use  the  formulas  developed  in  Chapter  4 to  compute  the 
average  message  delay  with  the  following  identification: 

b'  = 0 Cwe  include  the  message  lengths  in  f^  and  f^) 
f^  = message  length  + codeword  length  + possible  insertion 
due  to  the  flag  of  length 


thus 


Ef-  = ^ Z A.  cab.  + n )+  ph 

^ Ay  i^l  1 1 1 1 


M 

e4  = ^ Z A.(E(b.*n.)^  * + 2p^ECb.-^n.)) 

2 A.J.  '■  1 !•'  1-" 


f^  will  be  defined  as  in  Section  4.4  with 
c.  = V.  3=1,2. 

dj  = d^  » message  length  + codeword  length  + possible  inser- 
tion due  to  the  flag  of  length  V- 


1^8 


thus 


= 


1 , e 

T — ♦ V, 


■^'^1  , M 


'1 


1-e 


M 

Ef^  * i * 2v^/X^  E X.  (Eb.  * n + p^) 

i=l 


■Vi 


1-e 

M 
E 

"T  i=l 


T 2 


2 M 

V2  + 2V2/X^  E X.  (Eb.^n.^pp- 
i*l 


^ E X.  (E(b.+n.)^  + P?  + 2p?E(b.+n.)) 


and  one  obtains  from  formula  (1)  of  Chapter  ^ that  the  average  message 
delay  is  given  by 


M 

E X.  CECb  * p!  ♦ 2p'ECb, 

ECd)  . ^ x,ceb.n,3  . 1 ; 


1 - E X.  (Eb.*n.+p:) 
. , 1 1 1 


2 2Vi  M , '‘‘j  v,  m 

Vi  + -J—  E X.p.  + r— — (v2  + 2^  E x.p.)  + 

, i = l ^ , Vl  ^ i = l ^ ^ 

^ l_-e 

2 -X^v^ 

e 

V,  + v,  + 

1 -X_v-  2 


1-e 


>1-2 


M 

1 2 

^ E X.  (pf  - p.') 

^i=l  ^ ^ ^ 

1 2 

r ^ - Pi^ 

i»l  ^ ^ 


C3) 


If  a the  last  term  simplifies  to 


V,  MX. 

1 ^ 1 , 

2~  * ^ Pi 
i«l  ^ 


C4) 


1^9 

It  is  of  interest  to  see  how  E(d)  behaves  in  light  and  in 

heavy  traffic.  In  light  traffic,  the  second  term  is  negligible,  so  one 

sees  that  the  codewords  should  come  from  a Huffman  code, so  as  to 
M 

minimize  Z X^n^  . and  should  be  small,  say 
i=l 

Vj^»V2»2  , as  we  will  discuss  in  Section  4.7.  If  all  X^  's  are  more 
or  less  equal, 

E(d)  = ^ EX^  Eb^  + log^M  +1.5 
and  increases  with  log^M  . 

In  heavy  traffic  the  second  term  will  dominate,  and  it  will  be 
of  primary  importance  to  maximize  its  denominator,  thus  again  using  a 
Huffman  code,  and  using  a large  . If  all  X^  's  are  more  or  less 
equal,  we  can  have  stability  if  EX^ CEbj^+log^M)  < 1 . Thus  if  Eb^  is 
of  the  order  of  log2M  or  smaller,  the  maximum  throughput  of  the  system 
will  be  much  reduced  by  the  presence  of  the  codewords. 

5 . Strategies  Indicating  the  Buffer  State. 

A.  Introduction 

We  study  in  this  section  a class  of  strategies  where  periodically 
the  concentrator  samples  the  buffer,  makes  known  the  state  of  the  buffer 
to  the  deconcentrator,  then  transmits  all  the  messages  that  were  present 
in  the  buffer  at  the  sampling  time. 

In  addition  to  the  notation  introduced  in  Section  4. A,  we  call 
the  time  intervals  between  two  sampling  points  the  scanning  times,  and 
we  denote  them  by  s^,  , i*l,2,..  We  denote  by  m^  the  number  of 
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arrivals  from  source  j during  s^  , and  by  the  (variable)  length 
of  the  codeword  used  to  indicate  the  state  of  the  buffer,  i.e.  the  mj's 
j=l,2,..M  at  the  end  of  s.  . Note  that  s.  is  known  to  the  receiver. 
Thus  an  interesting  problem  is  to  find  a code  minimizing  E('jJsj^). 


This  code  will  be  very  complicated,  because  it  will  jointly  encode  the 
m^  's.  However,/ the  m^  's  are  conditionally  independent  given  s^^  , 
nothing  will  be  lost  by  encoding  separately,  except  some  redundancy. 


If  we  encode  the  m^  separately,  the  problem  is  to  find  a minimum 
average  codeword  length  code  for  a Poisson  random  variable.  This  is 


still  challenging  because  the  number  of  codewords  will  be  infinite,  so 
that  Huffman's  procedure  [Huffman,  1952]  cannot  be  applied  directly. 

We  solve  this  problem  in  Section  6. 

Here  our  goal  is  to  find  the  average  message  delay,  and  we  pro- 
ceed to  do  so. 


B.  Statistics  of  the  Scanning  Times 


Because  the  arrivals  are  Poisson,  the  scanning  times  form  a 


.Markov  chain  which  is  irreducible  because, for  any  value  of  s^.  there 
is  a non  zero  probability  of  no  arrivals  during  s^  . 

We  have  the  relation 


-xs.  , 

i+li  1 1 


11  1 -X'^  11  1 

2^  ,ni2  • • * E (c  ^in^in2 . • s . 


1 1 ill  • 

n (Bt(x))  ^ 


X > 0 


Of  course  we  want  to  average  this,  which  is  possible  analytically  only 

-XV  . ^ 

if  E(e  [mj^. , .nv^|,Sj^)  has  a sufficiently  simple  form.  In  particular 
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it  is  not  possible  if  u^.  results  from  the  algorithm  of  Section  6.  We 
will  restrict  ourselves  to  strategies  where  ECe  |mj  , . , . , m^  s^^ 


has  the  form 


• • ill  « 

n v*jx)  (v^.Cx))  ^ 


We  will  also  require  that  Z E v > 0 . Otherwise  infinitely  many 

j = l ^ 

scanning  times  could  take  place  in  no  time.  Without  causing  any 

s . 

difficulty  we  could  add  a factor  (V*(x))  ^ in  ( ^ , but  this  would 

be  fruitless.  We  will  examine  codes  that  have  the  above  property  after 

finishing  the  analysis  of  the  scanning  times  statistics. 

the 

We  can  now  average  (S')  on  m^  , /number  of  Poisson  arrivals 
during  s^  , to  obtain. 


-xs.  , 

E e s 


n V*(x)  exp  s.  Z X.  (V*  Cx)BUx)-l) 

j = l •'  I ^ j = l J J 


Denoting  [E  e by  S^(x)  we  have 


Re  X > 0 (7) 


St  (X)  = .TI  V*Cx)  S*  Z X.(l  - V*  (x)  B*Cx))  Re  X > 

1 j = l Vj=.l  J J - 


Defining  V*Cx);  = n V*^ (x) 
f°Cx):  . X 


f (X):  » Z X.(l  - V*  (x)  B*(x))  Re  X > 0 

j = l J ^ •' 

f^x):  . f^  (f^'^(x))  i > 1 Re  X > 0 


We  can  rewrite  ( ^)  as 


Re  X > 0 
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i-1 

stCx)  » n v*(fj(x))  s;(f\x)) 

j^O 

M 

We  will  show  now  that  if  p_  :=  E X.(Ev..  + Eb)  < 1 and  if 

‘ j=*l  J ^ 

oo 

lim  n x)  a 1 , then,  for  x real,  S*  (x)  ;=  lim  S?(x)  is 

xto  i=o  i-«» 

independent  of  S*  and  is  continuous  at  0 . Thus  ^Feller,  1966,  p. 
431^  the  process  s^  is  positive  recurrent  and  S*  i's  given  by 


S*(x)  » n V*Cf^(x)) 
j=o  ° 


Re  X > 0 


which  is  suitable  for  numerical  computation.  The  proof  is  simple:  by 

M 


fd 


t7(v*i.Cy)B*Cy3) 


y=0 


convexity  one  has  immediately  that  f (x)  < E X • 

_ “ j = l ^ 

= P.J,  X . Thus  f^Cx)  Pj  X and  lim  f^Cx)  = 0 , so  lim  S*(f^(x))  = 1 . 

i-*^  i-Ha 


To  be  able  to  use  the  reference  just  mentioned  we  need  lim  n V*Cf^(x)) 

x+o  j = l ° 

OO 

= 1 , which  is  insured  by  lim  H V*Cp^x)  = 1 , because  V*(x)  is 

0 0 

X+O  J=0 

decreasing  and  upperbounded  by  1 for  x > 0 . Note  that  this  condition 
and  the  continuity  of  S* (x)  at  0 are  guaranteed  if  E v^  < ^ but 
this  is  not  necessary. 


.Note  also  that  if  o_  = 1 , f^(x)  = x + o(x)  ; thus  if  n V*Cf^(x)) 

I . 0 

1=0 

converges  to  a number  different  from  0 , S*  will  depend  on  S*  , where- 
as if  S*(x)  *0  X > 0 , S*  is  not  the  Lapalce-Stielt j es  transform  of 
a probability  distribution.  At  any  rate,  the  process  s^^  is  not 
positive  recurrent  if  = 1 . 

From  (5),  if  the  process  is  positive  recurrent,  S*  satisfies  the 


relation 
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s*Cx) 


v*(x)  s* 


V*^(x)  B*Cx)) 


and  is  a decreasing  function  of  x , as  is  V*  . 
Thus  ( 8)  implies  that 
M 

X > I X (1  - V*  Cx)  BtCx))  X > 0 . 

j = l J J 


(8  ) 


Dividing  by  x and  taking  the  limit  as  x+o  , we  have  that  1 . 

We  can  conclude  that  < 1 is  a necessary  condition  for  the 

s^  process  to  be  positive  recurrent.  We  are  unable  to  prove  that  it 

is  sufficient;  we  still  need  a condition  on  V*  . From  now  on  we  will 

M ° 

assume  that  p.^,  < 1 and  Z E v^^  < “ , and  we  will  consider  only 

the  stationary  system  (i.e.  S*  = S*)  . 

Taking  the  values  at  x=0  of  the  derivatives  of  S*  in  (8  ) 
one  finds 

M 


Z E V . 


E s 


o] 


1 - Z X.  (E  V.  . + Eb.) 

j.i  J J 


M 


M 


C9) 


2 2 
E 3“^  4. 


CEs)  Z X.E(v,.+b.)  + 2 var  (v  . ) 

j=i  J J j=i  °y 


M 

Z X CE  v,^  + Eb.) 


C.  Description  of  the  Code 

A coding  scheme  that  satisfies  ( 6)  is  to  use  a unary  code  to 

encode  each  m^  . In  that  case  V* . (x)  = VI . fxl  » e 
j oi'  li^  ' 


Note  that 


15^ 


it  is  not  necessary  to  transmit  all  the  codewords  at  the  beginning  of 
the  scanning  time.  We  can  transmit  first  the  codeword  specifying  m^  , 
then  the  mj  messages  from  source  1,  etc.  k more  efficient  form  of 
the  same  code  is  to  prefix  every  message  with  a "1",  and  to  transmit 
at  "0"  when  all  messages  from  source  1 have  been  transmitted.  This 
has  a favorable  effect  on  the  message  waiting  times. 

We  will  consider  a generalization  of  this  strategy,  using 
flags.  We  transmit  first  all  messages  from  source  1,  then  a flag  of 
length  , then  all  messages  from  source  2,  etc.  Under  the  usual 
assumptions  (see  Chapter  3) 

-(v-1) 

VJj  (x)  =«  exp(-xvj)  , V*^  (x)=  1-2  ^ (1  - e 

j=l,2. . .M 

D.  The  Waiting  Time 


If  the  service  discipline  for  each  source  is  first  in  first  out, 

the  waiting  time  w.  of  a message  arriving  from  source  i u units 

z 

of  time  before  the  end  of  a scanning  time  of  length/ is  equal  to  u 

plus  the  sum  of  the  lengths  and  insertions  of  the  messages  from  sources 

l,2,...i-l  that  arrive  during  z , plus  the  sum  of  the  lengths  and 

insertions  of  the  messages  from  i that  are  already  in  the  queue,  plus 

the  flags  l,2,...i-l  plus  a possible  insertion.  Thus 

-xw. I i-1 

E(e  ^|u,z)  » e ^ exp  -z  Z \.(1  - V* . (x)  B*  (x))  . 

j»l  ^ J 

N * i 


exp(-(z-u)  X^(l 


V*.(x)  B*(X)) 


i-1 

n 


(x).VJ.Cx) 


Re  X > 0 
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u is  uniformly  distributed  between  0 and  z , because  the  arrivals 
are  Poisson,  so 

-w.xi  i-1  i-1 

ECe  Mz)  = n V*  Cx)  exp(-z  E A.  (1  - V*  (x)  Bnz)))  . 
j = l ^ j = l ^ J •' 


V*i(x) 


expC-zX.Cl  - V*^(x)  B|(x)))  - expC-zx) 
z(x  + X.  CV*.Cx)  B^(x)  - D) 


Using  the  statistics  of  z developed  in  Appendix  B one  obtains: 

-xw.  'h  ''ojW  h 


W*Cx) : = E(e  a 


Es(x  + X^(V*.(x)  Bt(x)  - m 


S*  X > E X.(l  - V*  Cx)  Bt(x)) 


V*i(x) 


Differentiating  one  obtains  the  moment 


i-1  E s^  “■ 

1 <J1  E s ( 2 jj  ii  CIO 

where  p^:  » 

One  can  find  from  this  an  expression  for  the  average  message  waiting 
1 

time.  Ew  — E X.  Ew,  . In  general  this  expression  is  quite  long 
i=l  ^ ^ 

to  write  and  depends  on  the  ordering  of  the  sources.  The  only  state- 


ment that  we  are  able  to  make  about  the  ordering  that  minimizes  the 
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average  waiting  time  is  that  if  Ev„.  = Ev„-  and  p.  = p.  one  should 

Oj  Oi  1 j 

have  i ^2  — ■ ■ ■ — equivalently,  Evj^^  + Ebj^  <_  Evj^2  * ^^2  — 

...  1.  . as  expected. 

that 

If  * X , EVq^  = Evg  and  p^  = p and  EVj^^=EVj,  one  checks  / 

^ M-1  _ Es^  ,, 

Ew  = — EVq  ^ ^ (1  . p.^) 

2 2 

If  in  addition  E (v  + b.)  =9  and  var  v..  = p , we  can  use  / 

Ij  j Oj  ' 

and  (10)  to  obtain 

P M-l  , ''®^0  S'® 

* i— 77  -2 2C1-Ot)  * "S 

Ev„  'lEv  X 9 .mo^/Es 

■ - ^ - - 2(--  p,r  - ™ 

Thus  if  in  our  coding  scheme  we  use  flags  of  length  v , and  if  the 
'sources  are  identical,  the  average  waiting  time  is  given  by 


Ew  = ^ 


~ 2 * 


1 - X^CEb  + 2‘*^'''^^)  1 - X^(Eb  + 


One  sees  that  in  light  traffic  the  first  two  terms  will  dominate , 
especially  when  M is  large.  In  heavy  traffic,  the  presence  of  the 
protocol  does  not  affect  the  capacity  of  the  line  if  one  chooses  v 
large  enough. 


fc.  Optimal  Source  Coding  for  a Cass  of  Integer  Alphabets 


15? 


For  finite  source  alphabets,  the  Huffman  coding  algorithm 

}luffnian,  1952  ] yields  a minimum  expected  codeword  length  code 

satisfying  the  prefix  condition.  Although  it  cannot  be  applied 

directly  to  countably  infinite  alphabets,  its  optimality  can  be 

used  to  develop  optimal  codes  for  these  sources,  as  [Golomb, 1966] 

/ 

and  pallager  and  Van  Voorhis,  1975]  did  in 

the  case  of  geometric  probability  distributions.  We  show  that  for 
a large  class  of  probability  measures,  including  those  whose  tail 
decreases  faster  than  geometrically  with  a ratio  equal  to  .613, 
the  coding  problem  can  be  reduced  to  a finite  one,  to  which  Huffman's 
procedure  is  applicable.  This  result  hinges  on  the  observation 
that  if  the  tail  of  a probability  measure  decreases  monotonically , 
no  matter  how  fast,  the  codeword  lengths  of  an  optimum  code  must 
not  increase  faster  than  linearly,  with  slope  1,  for  otherwise 
some  prefixes  will  not  be  used.  This  leads  to  the  coding  procedure 
developed  in  Theorem  1 . 

Theorem  1 

Let  p(.)  be  a probability  measure  on  the  set  of  nonnegative 
Assume 

integers.  / there  is  a nonnegative  integer  m such  that  for  all 
j > m and  i < j , the  following  hold: 


P(i)  > P(j) 

Cla) 

00 

p(i)  ^ Z pClc) 
k«j*l 

(lb) 
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Then  a binary  prefix  condition  code  with  minimum  average  codeword 

length  for  the  alphabet  consisting  of  the  nonnegative  integers  with 

the  above  probabilities  is  obtained  by  the  following  procedure: 

Consider  the  reduced  alphabet  with  letters  0,1,  .,,,  m+1  whose 

✓ 

probabilities  are 

Pi^Ci)  = p(i)  i £ m 

m 

p,(m+l)  - 1 - Z p(i) 
i=0 


Apply  Huffman's  coding  procedure  [Huffman, 1952]  to  this  reduced 
alphabet.  Denote  by  Cj^(i)  and  respectively  the  codeword  and 

codeword  length  for  letter  i (Cj^(i)  is  a sequence  of  binary 

symbols)  0 £ i £ m+1. 

From  there,  construct  the  codewords  CCi)  for  the  original 
alphabet  by 


CCi)  = C^Ci) 

C(i)  = (C^Cm+l), Ci-m-l)*0,  D 


i _<  m 
i > m 


(2) 


where  n*0  denotes  a sequence  of  n O's. 
Moreover,  with  this  procedure  the  average  codeword  length  T 
for  the  original  alphabet  is  given  by 

m 

7 = ECi)  + I,  Cm+l)  - m + Z (i.  (i)  - I (m+1)  + 

i=0  ^ ^ 


where  E(i)  = E ip(i)  < m+2 
i=0 

Proof 

It  is  a simple  matter  to  check  that  7 is  as  given,  and  that^ 
because  of  hypothesis  (Ib)^  E(i)  is  finite: 


i)p(i) 
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m 

00 

00 

00 

ECi)  = E 

E 

p(k)  + E 

1 p(k) 

i = 0 

m 

k = i + l 

00 

i=m+l 

00 

k=i  + l 

1 ^ 
i»0 

< m+2 

E 

k=i+l 

p Ck3  + E 
i=m+l 

p(i-l) 

The  codewords  (ij  satisfy  the  prefix  condition,  so  it  is  clear 
that  the  codewords  CCi)  also  do.  We  show  now  that  this  code  has 
minimum  average  length,  using  the  same  technique  as  Gallager  and 
Van  Voorhis  [3] . 

Let  the  letters  0,1  ...,  m+r  of  the  "r-reduced"  alphabet  have 
probabilities : 

p^Ci^  = pCi)  i < m+r 

00 

Pj.(m+r)  « E pCi) 
i»m+r 

The  hypothesis  ensures  that,  as  long  as  r is  greater  than  or  equal 
to  1,  the  smallest  probabilities  are  p^Cm+r-1)  and  p^Cm+r). 

Applying  Huffman's  procedure  to  the  r-reduced  alphabet,  one  verifies 
that  the  codeword  lengths  of  the  first  m+r  letters  in  this 
alphabet  are  the  same  as  the  lengths  of  the  corresponding  code^words 
given  in  (2).  So,  denoting  by  7~  the  average  codeword  length  for 
the  r-reduced  alphabet,  7~  converges  to  7 as  r grows. 

Let  T~  be  the  minimum  average  codeword  length  for  the  original 
alphabet,  the  minimum  being  taken  over  all  uniquely  decodable  codes,  so 
that  Z :■  a , We  claim  that  T~  < T~  because  we  can  obtain  a uniquely 
decodable  code  for  the  r-reduced  alphabet  by  taking  as  codewords  for 
letters  0 to  m*r-l  the  corresponding  codewords  in  the  optimum  code. 
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and  choosing  as  codeword  for  letter  m+r  the  shortest  remaining  code- 


word in  the  optimum  code.  The  average  codeword  length  of  the  code  so 

obtained  is  not  larger  than  , and  is  not  smaller  than  , since 

Huffman's  procedure  yields  an  optimal  code.  We  conclude  that 

I < I , but  l converges  to  T as  r increases,  so  i < i,  . 
r — or  — 0 

Recalling  that  ^ ^ ^ • Q.E.D. 

The  question  then  arises;  how  rapidly  must  p(.)  decrease  in 
order  to  satisfy  the  hypothesis?  A sufficient  condition  is  that  it 
satisfies  p(i)  ^p(i+l)  + p(i+2)  for  large  i ; a weaker  condition 
is  that  it  decreases  at  least  as  fast  as  where  g = j(/5-l)  = . 61S03 

If  p(i)  = pCi-^l)  p(i+2)  , then  p(i)  = p(0)  g^  , and  hypothesis 
(lb)  is  satisfied  with  equality  for  all  i and  j = i+1  . 

In  particular,  the  coding  procedure  developed  in  Theorem  1 is 
optimum  when  the  probability  measure  is  Poisson: 


P(i)  = 


, i -X 
X e 


i=0,l,... 


The  only  problem  is  to  find  the  smallest  suitable  value  for  m [as 
defined  in  Theorem  1).  One  checks  easily  that  p(i)  increases  with 
i to  a maximum  value  of  p(r)  , where  r = Fxl-l  , and  then  decreases 
(fxl  denotes  the  simillest  integer  not  smaller  than  x) . If  n is  the 
smallest  positive  integer  such  that  p(n)  ^p(O)  , the  smallest  we  can 
hope  m to  be  is  n-1  [a  smaller  m will  not  satisfy  hypothesis  (la)). 
Fortunately,  this  is  so,  and  wo  can  upperbound  this  m by  (ex]  - 1 , 
as  the  following  theorem  will  show.  The  sice  of  the  reduced  alphabet 
for  which  we  must  execute  Huffman's  procedure  and  maintain  a codeword 
table  is  thus  a reasonable  function  of  X . In  table  5.3  we  present  X as  a 


upper  limit  of  X 
for  that  m 

m 

upper  limit  of  X 
for  that  m 

1.0000 

15 

6.8004 

1.4142 

16 

7.1770 

1.8171 

17 

7.5531 

2.2133 

18 

7.9289 

2.6051 

19 

8..  3043 

2.9937 

20 

8.6794 

3.3800 

21 

9.0542 

3.7643 

22 

9.4287 

4.1471 

23 

9.8030 

4.5287 

24 

10.177 

4.9092 

25 

10.550 

5.2888 

26 

10.924 

5.6676 

27 

11.298 

6.0458 

28 

11.671 

6.4234 

29 

12.044 

Table  5.3 


Relation  between  X and  m for  Poisson  distributions 
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function  of  m for  small  values  of  x . In  particular,  if  X > then 
m=0  so  that  the  optimum  code  is  unary  and  its  average  codeword  length 
is  equal  to  1+X  . 


Theorem  2 


If  p(i)  = 


,i  -X 
X e 

i: 


i=0,l,... 


and  m is  the  smallest  nonnegative  integer  such  that  p(m+l)  £pC0)  , 
then 

a)  pA]  - 2 < m < feXl  - 1 


b)  p(i)  ^ Z pCj)  i > m 

j=i  + l 

and  thus  (1)  is  satisfied  by  this  m . 

Proof 

a)  We  will  first  upperbound  m . By  Stirling's  inequality  [Feller, 
1968,  p.  52] 

> |?j' • [r]‘ 

If  i ^ eX  , then  i'.  > X^  , so  that  p(0)  > pCi)  and  thus  m+I  < [ox] 
(A  more  careful  analysis  shows  that^  when  X is  large,  m is  approxi- 
mately equal  to  eX  - h log  2TreX  - 1 .) 

To  lowerbound  m , we  note  that  the  logarithm  function  is 


concave  downward  so  that  log  =*  log 


i ^ ^ 

i=i  J 


1 

17  I log  J = 
^ i=l 


i log  i.'  If  p(i)  _<  p(0)  , then  log  i ^ log  X so  that 

1 i*l  , , 

log  -j-  1 log  X , 

m+2 

— i*  • 


(3) 


and  thus 


oe 

b)  E p(j) 
j *i+l 


i ! 1 + 1 * (i  + 1)  (i  + 2) 


Ci*l)‘ 


(i*l)' 


...  ] 


■ P(i) 
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X 

From  inequality  (3),  if  p(i)  IP  CO)  , 1 J \ ^ ‘ 

^ ■ ITT 


Tliis  yields  the  desired  result. 
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7.  Analysis  of  Cvclic  Strategies 

A.  Introduction 

We  give  in  Sections  A to  F a complete  analysis  of  the  average 
message  waiting  time  for  two  important  cyclic  queueing  systems.  Mo 
explicit  reference  to  the  application  of  these  systems  to  the  encoding 
of  message  origins  is  made  before  Section  G. 

Communication  and  computer  systems  in  which  a single  server  is 
shared  among  several  queues  are  common.  For  example,  in  a concentrator, 
messages  arriving  from  many  sources  must  be  sent  on  one  output  line.  In 
time-shared  computer  systems,  a central  computer  must  provide  service 
to  several  users.  The  queueing  models  presented  here  may  be  useful  in 

1 

r 

the  analyses  of  these  and  similar  problems. 

Consider  a node  with  M incoming  communication  links  and  one 

outgoing  link.  Digital  messages  arriving  on  link  i are  queued  in  a 

buffer  i of  infinite  capacity.  Periodically  a "server"  empties  the 

queues  and  transmits  the  messages  on  the  outgoing  link.  We  will  study 

the  average  waiting  time  in  each  queue  under  two  service  disciplines. 

the 

In  the  first,  referred  to  as/"please  wait"  discipline,  the  server 
serves  only  those  messages  already  present  in  queue  i when  he  arrives, 
then  switches  to  queue  i+1  , which  takes  a random  time,  and  goes  on  in 


cycle,  visiting  each  queue  once  in  a cycle.  In  the  second  discipline,  the 
"exhaustive  service"  discipline,  the  server  empties  queue  i completely, 
then  spends  a random  time  switching  to  i+1  and  continues  the  cycle.  The 
random  time  between  queues  can  be  viewed  as  being  used  for  traiismission 
of  protocol. 

In  both  cases  the  i^^  queue  is  characterized  by  a Poisson  input 
with  parameter  messages  per  time  unit  and  a service  time  with  mean 

— time  units  per  message  and  second  moment  Q.  . The  switching  ti,mes  to 

^ 2 
queue  i have  mean  time  units  and  variance  . We  assume  all 

interarrival , service  and  switching  times  to  be  independent. 

Approximate  studies  have  been  made  by  [Leibowitz,  1961]  and 
[Kruskal,  1969].  [CooPer  and  Murray,  1969]  and  [Cooper,  1970]  studied 
both  disciplines  in  the  case  of  zero  switching  times.  [Eisenberg,  1972] 
considered  a more  general  configuration  for  the  server  cycle  and  allowed 
non  zero  switching  times.  He  solved  the  problem  of  the  exhaustive  ser- 
vice discipline  . [Konheim  and  Meister,  1974]  solved  the  discrete-time 
equivalent  of  the  e.xhaustive  service  problem  in  the  case  where  the  queues 
are  identical.  In  addition,  numerous  authors  referred  to  in  [Eisenberg, 
1972]  studied  the  system  of  queues  when  M=2  . 

This  research  was  pursued  before  the  publication  of  [Carsten  et 
al,  1977],  which  analyzes  the  "please  wait"  case  by  a method  similar  to 
ours.  The  rate  of  convergence  of  the  algorithm  presented  in  the  paper 
just  mentioned  is  not  as  claimed  there,  as  will  be  shown  in  Section  r. 

Our  solution  differs  from  previous  studies  in  the  fact  that  we 
use  a direct  approach,  without  trying  to  find  the  Laplace-Stielt j es 
transforms  of  the  waiting  time  distributions.  We  will  show  that  we  can 
find  all  average  waiting  times  by  solving  a single  system  of  about  M“ 


linear  equations  and  we  present  a practical  method  of  doing  so.  We 
remark  that  our  results  can  be  applied  to  the  case  of  zero  switching 
times  and  have  a very  simple  form  when  the  queues  are  identical. 
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In  many  communication  systems,  like  computer  networks,  beside 

I 

transmitting  messages,  one  must  also  convey  their  origins  or  destinations. 

This  can  significantly  increase  the  incurred  delay.  We  will  show  how 
the  previous  queueing  disciplines  can  be  applied  to  reduce  this  over- 
head. 

In  Section  B we  present  some  relations  valid  for  both  disciplines. 

The  "please-wait"  case  is  treated  in  Section  C and  the  "e.xhaustive- 
service"  discipline  in  Section  D.  In  Section  E we  present  the  simple 
modification  that  must  be  made  to  the  previous  results  when  the  arrival 
processes  are  compound  Poisson  processes.  In  Section  F,  we  propose  to 
use  an  iterative  algorithm  to  solve  the  system  of  equations  and  show 
that  it  converges.  The  application  described  above  will  be  treated  in 
Section  G. 

B Some  Relations  Valid  for  Both  Disciplines 

Results  in  this  section  are  very  general . They  hold  not  only  for 

the  two  service  disciplines  we  consider,  but  also  for  many  others,  e.g. 

if  one  limits  in  some  way  the  number  of  messages  that  can  be  served  in 

one  scan,  as  long  as  the  system  of  queues  remains  stable. 

We  consider  the  system  as  being  in  the  stationary  state  and  the 

server  as  undergoing  an  alternance  of  sw'itching  periods  of  length  c^ 

f K 

(_op  < i < o»)  and  service  periods  of  length  t^  , the  i service  period 
being  spent  in  queue  i mod  M.  (See  Fig.S.3)  From  there  we  define  the 
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. th 

X 


scanning  time  by 


s.  := 
1 


^i-M 


i-1 

E 

k=i-M+l 


(c^  + 


i-1 


k=i-M 


and  the  i^^  intervisit  time  by 


(1) 


V. 

1 


i-1 

E 

k=i-M+l 


+ c. 

i 


C2) 


In  the  steady  state,  we  have  the  following  relations  between  the  means 
and  variances  of  the  service  period  lengths: 


E[t^J  = E[t^ 


var  Ct-)  = var  (t.  , 

1 mod  M-* 


C3) 


and  similarly  for  the  switching,  intervisit  and  scanning  times.  From 
(3)  the  average  of  (1)  is  independent  of  i , and 

E[sJ  » E[s] 

We  can  find  the  value  of  E[s]  by  the  following  reasoning.  Let  T be 

the  time  for  n scanning  times  relative  to  queue  M to  take  place. 

Say  T » Sj^  •••  S2j^^  ♦...■*•  s^  . Denote  by  the  number  of  messages 

arriving  in  queue  j during  T , by  m®'^^  the  number  of  messages 

leaving  queue  i and  by  i.,  . ...^.  out  the  lengths  of  these 

j2  jm. 
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messages , 


We  then  have 


T ^ 1 I? 

— = Z — Z c . 


out 

m. 

+ — S 1. . 


n . n . n . . ji 

j»l  1=1  ■'  ^ ^ 1=1  •' 


r . ^ out 

5!  i ? I Ij_  !j L_  r 0 

j.l  " i=i  "j-M(i-l)  " n T ^in  ^out  ji 

L ]j 


Let  us  see  what  happens  as  n goes  to  infinity.  We  show  in  Section 


H that  if 


M X. 

E p,  < 1 (p.  :=  . 


1 u. 
1 


the  queueing  system  is  stable  and  the  process  i= . . . , -1 ,0 , 1 } 

T 

is  ergcdic;  thus  — goes  to  Efs]  with  probability  one  as  n 

1 

increases.  By  the  law  of  large  numbers,  — E c.  goes  to 

n j+MCi-1) 

out 

_in  ra. 

’"i  1^1 

V.  , to  X.  and  — E 1..  to  — , all  with  probability 

J T j out  .,11  u . ’ 

■’  m.  1-1  ■’  J 


one.  -* — goes  to  1 if  the  svstem  is  stable.  So  we  obtain: 
in 
m. 

1 


M 

E V. 
j»l  ^ 


1 - E p. 
j = l ^ 


This  expression  is  meaningful  only  if  E p.  < 1 , as  expected. 

i=l  ^ 
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One  finds  similarly  that 

« - Oj  ™dM) 
and 

mod  M = ] 

M M 

From  now  on  we  will  assume  Z p.  < 1 , Z v.  > 0 and  we  will  use 

. , 1 ’ . , 1 
1=1  1=1 

the  index  j where  we  should  use  j mod  M . 

C Waiting  Times  for  the  "Please  Wait"  Discipline 

We  proceed  in  three  steps.  First  we  will  express  the  average 
waiting  times  as  functions  of  the  moments  of  the  scanning  times.  We 
find  then  a relation  between  the  moments  of  the  scanning  times  and  those 
of  the  service  period  lengths.  Finally  we  show  that  these  are  related 
to  the  solution  of  a certain  system  of  linear  equations. 

Suppose  we  observe  a message  entering  queue  i and  we  note 
that  it  arrives  u units  of  time  before  the  end  of  a scanning  time 
(relative  to  i)  of  length  z and  that  it  finds  n messages  in  front  of 
it.  u , z and  n are  random  variables.  For  a first  in  first  out 
serviceiand  conditioned  on  n , u and  z , the  Laplace-Stielt j es  trans- 
form of  the  distribution  of  the  waiting  time  of  our  message  is 
-w.  X I 

E(e  ^ |n,u,z)  = (B*  (x))^  e"'^^ 

where  is  the  Laplace-Stieltjes  transform  of  the  distribution 

function  of  the  service  time  of  a message  in  queue  i . 


171 


We  will  now  remove  the  conditioning.  Averaging  on  n , the 
ntunber  of  Poisson  events  in  a period  of  length  :-u  , we  obtain 


-w.  X 

E(e  ^ u,2) 


® (X.  (z-u))”  -X.  Cz-u) 

E CB^(x))V 

n=0 

X^(z-u) [B^Cx)-l]  g-ux 


e 


-ux 


* e 

The  arrival  process  being  Poisson,  u is  uniformly  distributed  between 
0 and  z so 


E(e 


-W.Xi 

1 


1 


X.z[BrCx)-l] 


z X + X.  B*  (x)  - X. 

1 1 


If  the  scanning  times  relative  to  the  i^^  queue  have  a distribution 
function  Pr{s^j«x]  » S^Cx)  with  Laplace-Stielt jes  transform  St  , we 
show  in  Appendix  A that  Pr[z£x]  = y/S[s]  dS^Cx)  (this  would  be  a 
well  known  result  of  renewal  theory  if  the  scanning  times  relative  to 
the  i''^  queue  were  independent) ; from  there 


, St(X.  (l-B*Cx)))  - StCx) 

X > XBtCx)  - X 

Differentiating  one  finds  the  average  waiting  time  in  queue  i : 


E[w.] 


(1  + p.)  (1  + p.)  (1  + p.)  var(s.) 

i i-  = E[s]  L_  + i L. 

2 E[s]  2 2E[s] 

(5) 


Lot  us  find  now  a relation  between  varfs.)  and  varft.)  . If  n.  is 

1 1 1 

the  (random)  number  of  messages  present  in  queue  i when  the  service 
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starts,  t^  is  the  sum  of  independent  service  times, 

so  E[tj^  1 n.«n]  - ^ 

Elt?  I n.»nj  » nI0.  - ^ V 

U7  ]j7 

1 

n^  in  turn  is  the  number  of  arrivals  in  queue  i during  s^^ 
so  E[nj^|s^-s]  » X^s 

E[n?|s.»s]  » A?s^  ♦X.s 

*■  i'  1 ■*  1 1 


and 


E{tj^|s^»sJ  -pj^s 


C6) 


var  (t^)  - Ets]  ♦ var  (s^) 


C7) 


As  announced  we  now  reduce  the  problem  of  evaluating  (5)  to 
solving  a system  of  linear  equations. 

Proa  (1)  we  have 


var  (Sj^) 
E[s] 


i-1  i-1 

I Z 
k-i-M  j»i-M 


(8) 


where 


R. . I* 

13 


w 


(5) 


clearly  * *^ji 

* *^i*k)4,j^kM  ••• 

but  in  general  ^ 
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The  reason  for  dividing  by  E[s]  in  (8)  appears  before  formula  (16) 
From  (9)  and  then  (7)  and  (3)  we  obtain 

var  (t.)  var  (c.  J 

R. . = -^r  , ^ ^ ^ 

11  E[s] 


i-1  i-1 


^ A.  ~ X x-i  U.- 

= X.  0.  ^ 0^  E - E R.,  + 

^ ^ ^ j=i-M  k=i-M 


(11) 


If  i > j R.  . = E[t.  (t.  . c.^p]  - E[t.]  E[t.  . c.^^] 

= E[  E[t^  |{t^}  , , k < i ] (t^  + c^^^)] 

- El  E[tJ{y  , , k<i]]  E[t.  + 


The  outside  expectations  are  on  the  t,  's  and  c,  's  , k < i 

k k+1 


By  (6)  and  (1) 


R.  . 
ij 


i-1 


E[p.  E (t^  + ‘^k+P‘^^i  ^ ‘^i  + l 
^ k=i-M  J 


)] 


i-1 


Z E[t^  + ] E[t.  + c ] 

^ k=i-M  ^ ^ ^ ^ 


i-1 


P.  Z 
^ k 


i > j 


(12) 


If  we  define  the  set  I as  ((i,j)E2^:  l^i^M,  i-M+1  ^ j < i) 

2 2 

we  can  obtain  a system  of  M linear  equations  in  the  M unknowns 
(i.j)  ^ I by  rewriting  (11)  and  (12)  as 


R.  . 
11 


p2 

1 


M 

[ S 

j*l 


R.  . 
11 


i-1 

E 


2 ^ ^ R.J 

j»i-M+l  k=i-M 


X 0 


1 1 


^i  + 1 

W 


(13) 


17^ 


j i-l 

^ k=i-M  k=j+l 


(ij)  e I 
i / j 


(14) 


and  using  relation  (10)  where  necessary. We  present  in  Section  6 a 
practical  way  of  solving  this  system. 

From  (5),  (8)  and  (11)  we  obtain  for  the  average  waiting  time  in  queue 

E[s](l  + p.)  (1  + p.)  aj 

E[Wi]  = ^ [R..  . (X.0.  . ^)]  (IS) 


i = 1,2,  ...  M 


For  example  when  M=2  we  have  the  system 


“u  ■ * “22  ' ““loJ  * \®1  * -rr-: 

E[s] 


*^22  ■ ^2  ^■’^11  '^22  ^*^21^  ^2®^  - 


E[s] 


RlO  = Pi  ['^21  *^22^ 


21 


P2  I>^io  * *^11^ 


which  yields 


11 


2 ^2 

^2  2 *^12 
(Xi0i+ — ) (1-Pj^P2-P2(1-^P^P2^)-^  (^2®2'"~^‘^1*-^'^1^2*^^P 
E[s] E[^ 

(l-p^-p2)  (l+0j+p2+Pjp2(l+Pi+O2  + 20^P2)) 


and 
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E[s]CUp-) 

E[w  ] = L + 

2 


Cl+pp  [ CVj0^+— ) Cl+P^P2+2p^P2+2p2)  + (A-,02+— ) Cl-p^p2+2p) 
E[j E[s] 

2 Cl-p^-02)  (l+p^+P2+P3^P2(l+Pj+P2+2PjP2)) 


2 In  the  case  of  vanishing  switching  times  so  that  E[s]  and 
'^i 

— become  null,  the  system  (13),  (14)  remains  valid  and 


E[Wi]  . ‘ [R.  . . x.aj 

2p-i 


i = 1,...,M 


In  the  important  case  where  the  queues  are  identical,  or  more 

a?  2 

precisely  if  p.  = p and  .\.0.  + gjjj- = A0  + , i = 1,2,...M  w€ 

find  that  for  Ci,j)  e I 


R.  . 

R - 

ij  ■ l-(M-l)p 


i ^ j 


R.  . = 
11 


(l-(M-l)p)  [X0+  |t^] 


(1+p)  (1-Mp) 


so  that 


" 2~cf-Mo)  Cv(l-o)^X0)-  ^ 
G[w.]  =.  E[s]  * 2TT:Mp)  ifiT^  ' i=l,2,..,M  (17) 


The  v^'s  need  not  be  equal  for  relation  (17)  to  hold.  We  see  that 
the  part  of  the  delay  due  to  the  switching  times  is  equal  to 

rr-i  (l^o)  . M _ Mvfl^o)  . 

2 2(l-Mp)  UsT  ■ 2(l-oM)  2^ 


If  the  queues  are  not  identical,  the  overhead  is  more  difficult  to 

2 

assess.  However,  if  the  a^'s  are  all  zero,  one  deduces  from  formula 
IS)  that  the  existence  of  switching  times  causes  an  extra  delay 
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E[s]Cl^i) 

^ for  messages  in  the  i queue.  Other  moments,  like  the 

average  queue  lengths  and  the  means  and  variances  of  the  number  of 
customers  served  in  one  scan  can  easily  be  computed  from  the  previous 
results . 


D.  Waiting  Time  for  the  '^Exhaustive  Service"  Discipline 

The  method  used  in  this  section  is  very  similar  to  the  one  used 
in  Part  C . 

The  customers  present  in  queue  i when  t^  starts  have 
arrived  during  v^  . [Avi-Itzhak,  Maxwell  and  Miller,  1965]  and 
[Eisenberg,  1972]  found  an  expression  for  the  average  waiting  time  in 


queue : 


X..0: 


E[v.]  X.9.  E[v  ] var  (v.) 

" 2Cl-p^)  ^ 211^  " 2Cl-p^]  * ^ 2E[v^J 

If  n customers  are  present  in  queue  i when  service  starts  we  can 


regard  t.  as  composed  of  n independent  "M/G/1  busy  periods" 

^ 1 ®i 

[Takics,  1962]  each  with  mean  : — ^ ^ and  second  moment 


yiCi-Pi) 


(i-Oi) 


3 • 


Using  this  observation  and  a reasoning  similar  to  the  one  used  in  Part 
C,  one  finds 


XiOi 


^i  >2 


var  Ctj]  = 

Cl-Pj  ^ 


C19) 

C20) 
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Let  us  now  find  the  system  of  equations; 
from  Q) 


var  Cv.)  i-1 

^ . r 


■eTH 


i-1 

I 


j=i-M+l  k=i-m+l 


of 

E[sJ 


C21) 


where 


E[Ct^  + c.)(t  + c )]  - E[t.  + c.]  E[t.  + c ] 

K.  . :=  i i —J: 1 2 2_ 

ij  ElT] 


and  has  the  same  properties  as  R^j  in  (9)  CIO) . 
Using  (20),  (21),  (19)  and  (2) 


X.  0. 


* (r-^) 


‘'i  .2 


(1-Pj^)^  ''^■^i  j=i-M+l  k=i-ra+l 


Pi  .2  '"i 


^ J 2 

P,  a.  a. 

^ 1-Pj^  ETsT  ^ EtsJ 


K.  . 
11 


, a.  p. 

. -i-  . -i. 


i-1  i-1 


2 *•  i i ■ ^ ’ 2 “ “'■iV 

(1-P^)  E[s]  (1-P^)  j=i-iM+l  k=i-M+l  ^ 


^ '^.k  1 

(22) 


and  by  (19)  and  (2) 


K.  . 


Pi 


ij  l-p.  *Sci 

•'  1 k=i-M+l  •' 


i-1 

r 


i > J 


(23) 


Defining  the  set  J by  J = {(i,j)  e Z^; 1 ^ i £ M , i-M+2  < j < i} 


We  obtain  a system  of  M(M-l)  linear  equations  in  the  unknown  K. 


ij 


(i,j)  £ J by  rewriting  (22)  and  (23)  as 
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K.  . 
11 


1 


2 

Oj 


Cl-p^)  E[sj 


[ E K,,  . 2 


a-pj‘  j=l 


JJ 


‘i' 


^z‘ 


K. 


j=i-M+2  k=i-M+l 


jk]  (24) 


K.  . = 


P.  j i-1 

^1  k=i-M+l  ■’  k=]+l  ■’ 


(.25) 


From  (18),  (21)  and  (22) 


2 

Oi 


E[s](l-p  ) K (1-p  ) (1  + p ) 

EiWij  - ---^2— ■■  - ^ ^77^ 

2p  2p^  E[s] 

i 

i=l  . . . M 

As  in  part  C this  solution  remains  valid  when  the  switching  times  vanish. 

ol  2 

When  o^=p  and  + eIsJ  ’ ^ eIsJ  obtain  for  (i,j)  e J 


pK.  . 

K - 

ij  l-(M-l) 


1 > 1 


so  that 


K.  . 
11 


E[w^] 


(l-(M-l)p)  [A0  + 

— Cl -p) a -Mo) — 


E[s]  * 2(I%T  * eITT  ^ 


(26) 


i = 1.2,...M 


The  difference  between  the  result  for  the  "please  wait"  discipline  (18) 
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and  this  one  is  oE[s]  . This  corresponds  to  the  fact  that  the  fraction 
of  messages  arriving  in  a queue  that  the  server  is  emptying,  i.e.  p , 
is  delayed  an  extra  scanning  time  in  the  "please  wait"  case. 


E.  Generalization  to  Comoound  Poisson  Processes 


To  be  complete,  we  investigate  here  the  simple  modifications 
that  must  be  brought  to  the  previous  theory  when  the  arrival  processes 
are  modeled  as  compound  Poisson  processes.  This  is  sometimes  a 
realistic  model  when  data  sources  emit  messages  in  clusters  separated  by 
long  idle  periods.  In  this  case  the  i^^  queue  is  characterized  by  the 
following  statistics:  clusters  of  messages  arrive  in  a Poisson  manner, 
at  a rate  of  clusters  per  unit  of  time.  A cluster  is  composed  of 

a random  number  of  messages.  Let  the  mean  number  and  mean  square  number 
of  messages  in  a cluster  be  and  respectively.  The  message 
lengths  and  switching  times  have  the  same  means  and  variances  as  in 
previous  sections,  and  we  assume  all  interarrival,  service  and  switching 
times,  and  the  number  of  messages  in  a cluster,  to  be  independent. 

If  we  consider  the  set  of  messages  present  in  a cluster  as  a 
supermessage,  with  mean  length  and  mean  square  length  of 

^ *-^i"^i^  [Karlin  and  Taylor,  1975,  p.  13]  respectively,  the 
supermessages  will  arrive  in  a Poisson  manner  so  that  the  analysis  of 
sections  2,  3 and  4 remains  valid,  as  far  as  the  scanning,  intervisit 
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in  all  formulas. 


The  average  waiting  time  of  a message  is  equal  to  the  average 

waiting  time  of  the  corresponding  supermessage,  plus  a term  taking  into 

account  the  average  time  necessary  to  serve  other  messages  in  the  same 

cluster.  The  average  extra  delay  suffered  by  the  n^^  message  served  in 

a cluster  is  equal  to  (n-1)  ^ , so  the  average  sum  of  the  extra  delays 

suffered  by  all  messages  in  a cluster  contianing  exactly  n messages  is 
n(n-l) 


equal  to 


2y. 


Averaging  on  n and  dividing  by  the  average  number 


of  messages  in  a cluster  yields  an  average  message  extra  delay  of 

^i  ■ i 1 


25.  p. 

1 1 


F.  Properties  of  the  Systems  of  Equations 


In  the  first  part  of  this  section,  we  present  alternate  forms 
for  the  systems  of  equations  (13)  (14)  and  (24)  (25) . These  new  systems 
contain  more  unknowns  but  have  a simpler  structure,  which  is  useful  when 
the  time  comes  to  solve  them  numerically.  In  the  second  part  we  show 
that  all  systems  considered  in  this  paper  can  be  solved  by  an  efficient 
iterative  algorithm. 

Using  equation  (12)  we  can  rewrite  (11)  as 


i-1 


'^1+1 

R--*o-  1 R--  + X-0-*  ..r  1 

“ ^ j=i-M  ^ ^ 


C27) 


-2, 


Defining  the  set  I'  by  I':=  {(i,j)  c Z jl  < i < M , i-M  < j < i}  we 
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can  obtain  a set  of  M(M+1)  equations  in  the  unknowns  Ci,j)  £ I' 

by  rewriting  (12)  and  (27)  as 


R.  . 
ij 


j i-1 

k=i-M  ^ k=j+l  ^ 


+ 6.  . (X.0.  + 

ij  1 1 


“1: 

snr 


(28) 


(«.. 

IJ 


1 if  i=j 
0 otherwise) 


and  using  relation  (10)  when  necessary. 

Similarly  the  equations  (22)  (23)  can  be  rewritten  as 


j 

[ j: 

k=i-ni+l 


+ 


i-1 

E 


k=j+l 


or 


6 

ij 


rx.0. 

1 1 


(29) 


K.  . 


j i 

P.  [ E K.  + E 
k=i-M+l  k=j+l 


[X.0. 
1 1 


* 


for  (i,j)  such  that  1 1 i ^ M,  i-M+1  ^ j £ i 


(30) 


The  system  (28)  can  be  rewritten  in  matrix  form  as 


R = AR  + B 

where  R is  a column  matrix  formed  by  the  R^^  , (i,j)  e I'  . A 
straightforward  computation  of  the  solution  of  (31)  can  become  quite 
lengthy,  A being  a M(M+1)  by  M(M+1)  matrix.  Instead,  the  form  of 

equation  (31)  suggests  an  iterative  procedure,  wherein  the  n estimate 

^ th 

of  R,Rj^  , is  expressed  in  terms  of  the  (n-1) 


estimate  by 
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R =>  AR  + B 

n n-1 

By  inspecting  equation  [28)  one  checks  that  each  iteration  requires 
3 2 . . 2 

only  M +M  +M  additions  and  M +M  multiplications.  The  variables  that 

/N  A 

need  to  be  stored  are  the  elements  of  and  , together  with  the 

p^'s  and  the  ^ total  of  2M[M+2)  variables.  A 

variant  to  the  algorithm  exists  (see  the  specialized  texts,  e.g.  [Varga, 
1962])  that  reduces  this  number  to  M(M+3)  . In  either  case  this  is  far 
from  the  M that  one  could  expect.  It  is  known  that  R^  converges 
to  the  solution  R when  the  norms  of  all  eigenvalues  of  A are  less 
than  1 . Fortunately,  this  is  the  case  when  the  system  of  queues  is 
stable,  as  we  shall  see. 

If  > 0 i=l,2  ...  M,  one  can  check  that  the  matrix  A is 

an  irreducible  nonnegative  matrix  in  the  sense  that  all  its  elements 
are  nonnegative  and  it  cannot  be  rewritten  as 


1 


0 

A. 


(with  A^  and  A^  square) 


3 2 

by  any  permutations  of  rows  followed  by  the  same  permutations  of 
columns.  Among  the  numerous  properties  of  this  type  of  matrix 
[Gantmacher,  1960,  Ch.  13],  we  use  the  following:  the  eigenvalue  of  A 
with  the  largest  norm,  a , is  real,  positive,  and  bounded  as  follows: 
(A)^R  (A)^R 

"i"  C32) 

for  all  non  zero  vectors  R with  elements  > 0 . 

We  denote  by  (A)j^  and  (R)j,  the  k^^  row  of  A and  R.  Now,  if  we  use 
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in  (32)  a vector  R with  its  elements  R 
find  that 


set  equal  to 


P • P • 
1 1 


we 


M 

a - E P.  < 1 (33) 

i=»l 


If  some  p^  's  “ 0 , one  verifies  jasily  that  relation  (33)  still 
.holds.  A similar  algorithm  can  be  used  to  find  the  solution  of  the 
systems  (13)  (14) , (24)  (25)  , (29)  (30) . One  finds  by  the  same  method 
the  following  relations  about  the  dominant  eigenvalue  a . 


Systems 
(13)  (14) 


Relations 


M 


1 > Z p . > a > 
i»l 


M 

1=1 


(24)  (25) 


C29) 


M 

°i-»k 

1*1 

1 > E p . > max  — ; > a > min 

i=l  " k ^ - "k  ~ " k 


M 


M 

1=1 


1 - Pi 


Pk^^O 


M 


M 


U ^ P--Pv  ^ P--Pi, 

M 1 k 1 k 

1 > E p.  > max  > a > min  

i=l  " k ^ - °k  - - k ^ - '^k 

Pk^^O 


(30) 


M 

1 > E p . ■ a 
i-1  ^ 
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G . Application  to  the  Encoding  of  the  Message  Origins 


In  the  light  of  the  strategy  used  in  Section  3 it  is  clear  how 
the  cyclic  strategies  developed  here  can  be  used  to  indicate  the  message 
origins.  It  suffices  to  queue  the  messages  from  origin  i in  a special 
buffer  that  is  emptied  in  a cyclic  fashion,  and  to  indicate  the  end  of 
the  service  with  a flag  of  length  . If  the  probability  of  insertion 
is  known,  it  is  possible  to  apply  the  previous  results  to  compute  the 
system  performances. 

In  particular,  if  the  queues  are  identical  and  the  probability 
of  insertion  equal  to  one  obtains  from  (17)  and  (26) 


Mv(l  * X(Eb  + 2'*-'^'^^))  X.^(Eb^  + 2Eb 

2(1  - MX(Eh  + 2'‘-'^“^^]) 


MX(Eb  + 


2 


-(v-1) 


■ C 

)) 


) 


for  the  "please  wait"  discipline,  and 


Ew  3 


^-(v-l)^Mv(l  - X(Eb  > 2'*^^'^^) 
2(1  - MX(Eb  f 


X.j.(3  + 2Eb 

2(1  - MX(Eb  + 2'*^'^"^^) 


for  the  "exhaustive  service."  The  first  terra  takes  into  account  the 
possible  insertion  in  frontof  a message.  Here  b refers  to  the  length 
of  a messa,  ;,  exclusive  of  any  insertion. 

two 

We  note  that  in  light  traffic  the  first/ter::®  will  dominate  in 
both  cases,  whereas  the  presence  of  the  protocol  does  not  affect  the 
capacity  of  the  link  if  long  enough  flags  are  used. 
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H.  Condition  for  Stability 

We  show  here  that  if  E < 1 , the  queueing  system  is  stable, and 
the  process  j = . . . , -1 ,0 ,1 , . . . } formed  by  the  lengths  of  the 

scanning  times  relative  to  queue  i is  ergodic. 


To  keep  the  argument  short,  we  will  prove  these  results  only  in 
the  case  where,  with  probability  one,  all  service  and  switching  times 
take  only  a countable  number  of  values,  so  that  the  state  spaces  of  the 
Markov  Processes  defined  below  are  countable. 

We  define  d^:=  , t^^.  , ... 

dj^  's  form  a non  stationary  Markov  Process  and  by  (6) 


,T  . The 


E[dk,i|d^=d]  = 


‘^k  '^k  ^k 
Q 0 Q 


0/ 


for  the  "please  wait"  case.  If  the  'fexhaustive  service"  discipline  is 
used,  the  expression,  is  similar  except  that  the  first  in  the 

square  matrix  above  is  replaced  by  0 , and  the  others  by  pj^/Cl-p^^) 
Cby  (19)).  In  both  cases  we  can  write 


E[d^.l|d,.d]  = d 


186 


We  consider  now  the  process  > k=.  ••  ,-1,0,1, .. . for  i 

fixed.  It  forms  a stationary  Markov  chain,  all  values  of  the  form 
(0  , c^^j^  , 0 , c?^2  •••■>  0 , , where  the  cj  's  have  non 

zero  probability,  are  accessible  in  one  step  from  all  states,  so  the 
process  is  either  recurrent  or  transient.  One  finds  that 


^ ‘^i+(k+l)M  '^i+kM  “ 


^i  ^i+M-1  ^i+M-2  • • • ^i 


for  some  • If  the  eigenvalue  of  ^i+M-2 

largest  norm,  a , is  less  than  one,  for  any  initial  conditions , the 

mean  of  is  uniformly  bounded,  so  the  process  is  positive 

recurrent.  Using  the  same  technique  as  in  Section  F,  specifically 

T 

formula  C32)  with  test  vector  (P • , 0 , P ■ , , 0 , . . . p . , 0)  , one 

checks  that  a is  < , = , > 1 with  Z p . . 

j = l ^ 

If  the  ergodic,  then,  a fortiori,  so  are  the 

*i+k,M  because  they  are  equal  to  the  sum  of  the  elements  of  the 

d.  's. 
i*kM 


k.. 


r 

r 


8.  Comparison  of  the  Practical  Coding  Schemes 
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In  the  previous  sections  we  have  analysed  four  different 
practical  coding  schemes.  'Vhich  one  is  the  best?  If  the  input 
statistics  are  known,  the  performances  can  be  computed  and  the  various 
parameters  optimized.  Only  then  can  one  decide.  It  is  however 
possible  to  make  some  general  statements,  as  we  will  do  here.  For  the 
sake  of  simplicity  we  assume  that  all  sources  have  the  same 
statistics,  that  all  flags  have  length  v 

and  that  the  probability  of  insertion  is  . 

For  convenience  we  reproduce  here  the  formulas  for  the  average 
waiting  time: 

I.  First  In  First  Out:  (formula'- 3 5 4 of  Section  ) 

Ew  = ^ * En  ^ '^UECb^n]^  > 2 E(b*n) 

‘ 2(1  - MX(E(b*n) 

We  recall  that  n is  of  the  order  of  log.,M  . 

II.  Sampling:  (formula  11  of  Section  5 ) 

Ew  . - I . ^ , — 

^ 1 - MA(Eb  + 

^ MA(Eb^  * 2Eb  2'*-^'^^ 

2(1  - Mx(Eb  + 2‘^''‘^^)) 

III.  Please  Wait:  (formula  5 of  Section  ? ) 

Ew  . 2-^''-^)  * Mv(l  * xfEb  ^ 

2(1  - Mx(Eb  ♦ 2"-''"^^)) 
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^ MXCEb^  -f  2Eb 

2(1  - MX(Eb  + 2“^''“^^) 

IV.  Exhaustive  Service:  (formula  26  of  Section  7 ) 

--fv-n  Mv(l  - XCEb  + 

Ew  =•  2 '■  .fv.n 

2(1  - MX(Eb  + 2 '•  ■')) 

^ MXfEb^  > 2Eb 

2(1  - MX(Eb  + 2'*^'^"^^)) 


One  sees  immediately  that,  when  the  different  origins  have  the 
same  statistics,  strategy  III  is  better  than  II  if  M > 1 , but  not  as 
good  as  strategy  IV.  The  relative  difference  between  III  and  IV  is 
generally  small.  If  M > 1 , the  overhead  in  strategy  II  is  double 
the  overheads  in  III  and  IV.  If  M»1  , II  is  equivalent  to  III. 

In  light  traffic,  strategy  I is  better  than  IV,  because 
+ log  M ^ . However,  strategy  IV  performs  better  in  heavy  traffic; 
if  V is  large  enough  .the  presence  of  the  protocol  does  not  diminish 
the  traffic  that  strategy  IV  can  handle.  All  of  this  is  consistent  with 
what  was  said  in  Section  3.  in  light  traffic  it  is  hardly  possible  to 
reorder  the  messages,  thus  strategy  I must  be  almost  optimal.  In 
contrast,  strategy  IV  works  well  when  many  messages  from  each  origin  are 
served  in  every  scan,  because  the  flag  is  used  only  once  for  each  batch. 
Note  that  as  indicated  in  Chapter  3 , strategies  II,  III  and  IV  would 
work  better  if  the  flag  lengths  were  allowed  to  vary  from  message  to 
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message  in  a.  batch,  according  to  the  probability  (_a.s  computed  by  the 
receiver)  that  the  batch  will  terminate  after  the  present  message. 

The  observation  that  Strategy  I works  well  in  light  traffic 
and  Strategy  IV  in  heavy  traffic  suggests  a hybrid  scheme,  similar  to 
what  [Hayes,  1976]  and  [Capetanakis,  1977]  use  in  another  context. 

The  idea  is  to  group  the  M origins  in  M'  groups  (M'  £ M) , say 
origins  1 and  2 in  group  1,  3 and  4 in  group  2,  etc.  Strategy  IV  (or 
II  or  III)  is  used  to  differentiate  between  the  groups,  while  prefixes 
are  used  to  indicate  the  origins  inside  of  a group.  In  the  example 
just  mentioned,  messages  from  odd  origins  would  be  prefixed  with  a 
"0",  the  others  with  a "1".  By  varying  the  size  of  M’  one  obtains 
a continuum  of  possibilities,  ranging  from  M'  » 1 (optimal  in  light 
traffic)  to  M'  » M (best  in  heavy  traffic).  The  performances  of 
this  scheme  can  be  obtained  by  modifying  in  a trivial  fashion  the 
results  for  Strategy  IV  (or  II  or  III). 

Another  point  that  we  will  investigate  is  the  relation  between 
the  average  message  waiting  time  and  the  average  number  of  protocol 
bits  per  message,  denoted  by  h , which  is  equal  to  1/MX  - Eb  (formula 
(1)  of  Section  2) . To  be  able  to  compare  these  results  with  those  of 
Section  3 we  will  rather  compute  the  relation  between  the  average  number 
of  protocol  bits  per  message  and  the  average  number  of  messages  waiting 
for  service,  Em  , which  by  Little's  formula  [Little,  1961]  is  given  by 
Em  = M X Ew  . 

As  we  have  noted  earlier,  some  of  the  protocol  bits  convey 
information  about  idle  times,  and  some  about  message  origins.  In  Section 
3,  all  protocol  bits  transmit  information  about  the  origins.  The 
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1 

I 

! 

f 

I 


comparison  with  Section  3 will  still  be  meaningful  in  heavy  traffic, 
where  the  encoding  of  the  origins  uses  up  most  of  the  protocol  bits. 
This  is  clear  in  the  case  of  Strategy  I. 

There, 

Em  » ^ + MAEn 

+ MXCE(b^-n)^  2E(bfn)  f 

2Ch  - En  - 


or 


En  + 2 


-Cv-1)  ^ (ECb-nn)^  f 2E(b-t-n)  2'*-^"^^:) 


Em 

MX 


■ 2 ■ ^ 


En 


The  first  term  represents  information  about  the  origins.  As  Em 
increases,  so  does  the  optimal  v and  h tends  to  En  , as  should  be. 

In  the  case  of  Strategy  II,  the  third  term  in  the  formula  for  Ew 
will  dominate  in  heavy  traffic.  We  will  thus  have 

Mv 


Em  a 


h - 


Em 

Optimizing  on  v and  neglecting  the  integer  constraint.,  one  finds  that 
the  optimal  v is  given  by  v = log2  [2  log^  2 Em/M)  . This  value  of 
V justifies  the  approximation  of  Ew  by  the  third  term  in  the  formula 
above.  Using  this  value  in  the  formula  for  h , one  obtains 

h . |j  logj  (2.  (log,  2) 


wilich  has  exactly  the  same  form  as  what  was  found  for  Strategy  II  of 
Section  3.0,  except  that  a factor  j is  missing  here.  This  is  easy  to 
explain  qualitatively:  the  only  difference  between  the  situations 
in  Section  3.0  and  in  this  section  is  that  the  number  of  messages 


191 

served  in  one  scan  is  variable  here,  which  causes  a loss  of  efficiency 

because  is  a convex  function. 

The  cases  of  Strategies  III  and  IV  are  similar,  we  treat  IV  only. 

The  second  term  in  the  expression  for  Ew  will  eventually  dominate. 

- fv-ll 

Neglecting  the  term  2 ^ •'in  the  numerator,  we  obtain 


2Ch 


or 


h > ^ (1  . XEb) 

The  optimal  v is  given  by 

[4  Clog.  2) 


log. 


XEb 


Em 

M 


and  the  resulting  h is  equal  to 

Im  ^°®2  (1  - XEb)  M 


. Cl  - XEb)  M , 
h - 5 i-  ^ log, 


This  is  about  twice  as  efficient  as  Strategy  II,  but  less  efficient 
by  a factor  of  two  than  the  comparable  strategy  of  Section  3.0. 

He  can  thus  conclude  that  although  in  heavy  traffic  strategy 
IV  is  the  most  efficient  of  the  strategies  we  analyzed,  it  is  probadsly  fzur 
from  being  optimal,  as  indicated  by  the  results  of  section  3.  Nevertheless 
cnozmous  gains  can  be  realized  by  using  it  in  hea'vy  traffic,  as  illustrated 
in  the  following  numerical  example. 

Fixed  length  messages  arrive  at  a 
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concentrator  in  a Poisson  manner,  at  a rate  X on  each  of  M input 
lines.  We  want  to  transmit  on  a noiseless  binary,  synchronous  output 
link  not  only  the  messages,  but  also  their  origins. 

Usually  this  is  done  by  prefixing  messages  with  an  address.  In 
some  cases  this  scheme  significantly  increases  the  average  delay 
incurred  by  the  messages,  as  a numerical  example  will  show. 

Let  us  use  as  time  unit  the  interval  necessary  to  transmit  one 
bit  on  the  output  link  and  . let  us  take  M=16  , the  length  ^ » SO  and 
X = naively  forget  about  the  addresses,  we  obtain  from  the 

formula  of  the  mean  waiting  time  in  a M/D/1  queue: 


I MA  (i.)2 

EM  ■ i -fpfer 


100 


If  we  use  a 4 bit  address  and  prefix  all  messages  with  a "1"  to 
distinguish  them  from  idle  periods  during  which  we  transmit  "0"  's,  the 
length  becomes  55  (a  10%  overhead)  but  the  delay  becomes 


Elw] 


+ .5  ^ 202 


1 - 


1000 


(the  term  .5  takes  into  account  the  synchronous  nature  of  the  output 
link) . The  presence  of  the  addresses  doubles  the  mean  waiting  time  in 
queue . 


Another  simple  way  of  transmitting  the  origin  of  the  messages  is 
to  use  the  cyclic,  exhaustive  service  discipline.  We  queue  messages  in 
a buffer  corresponding  to  their  origin,  prefix  them  with  a "1"  so  that 


their  length  is  now  51  bits,  process  every  queue  in  turn  and  when  it  is 

en^jty  transmit  a "0"  . Our  "switching  time"  has  thus  mean  v * 1 and 
2 

variance  a * 0 . From  formula  (26)  of  Section  7. 


_ 1 16  1 Cl  - Sl/1000)  . 1 16  1/1000  (51)^  ~ ,,, 

* 7 — : — RTi * 7 — ; — wTi — " 

The  improvement  is  due  to  the  fact  that  this  way  of  transmitting  the 
address  is  naturally  adaptive.  When  many  messages  are  waiting  in  queue, 
few  bits  per  message  are  needed  to  indicate  the  origin.  Of  course,  this 
strategy  works  well  only  when  the  traffic  is  heavy,  but  this  is  precisely 
the  time  when  it  is  worth  reducing  queueing  delays.  As  the  traffic 
growth  heavier,  this  scheme  works  better  and  better. 

9.  Suggestions  for  Future  Work 

We  have  shown  in  Section  8 that  the  "sampling"  and  "polling" 
strategies  behave  in  the  same  way  in  the  fixed  length  queue  and  variable 
length  queue  cases.  Unfortunately  we  know  from  Section  3 that  they  are 
rather  inefficient.  One  would  expect  that  the  efficient  strategies  for 
the  fixed  queue  length  case  will  also  perform  well  in  a variable  length 
queue  environment.  Their  analysis  is  not  easy,  because  they  introduce 
much  memory  in  the  queueing  system,  but  should  be  attempted. 

On  a more  abstract  level,  the  state  of  a queue  can  be  regarded 
as  forming  a partially  observablt-  Markov  process  when  the  input  process 
is  Poisson.  One  should  be  able  to  use  the  same  method  as  in  Section  3 
and  determine  a strategy  that  minimizes  the  entropy  of  the  output  sequence. 


i 
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We  prove  here  a theorem  that  is  used  in  Section  4 of  Chapter  3. 

Let  I S , P)  be  a probability  space.  We  recall  that  if 
X : n -►  ]R  is  a measurable  function,  E|x|<®  ,and  if  B jg  a a- 
algebra  included  in  S » ECx|8)  is  defined  as  a 8 -measurable 
function  such  that  E(x|b)  dP  = xdP.  for  every  B in  8 . One 
can  show  jboob,  1953,  pp.  16  and  32j  that  ECx|8)  exists,  that  any 
two  versions  of  it  are  equal  almost  everywhere,  and  that  if  z is 
a 3 -measurable  function  with  E|xz|<:»>  , E(xz|8)  * z ECxj8)  almost 

everywhere.  These  facts  are  used  below. 

Let  m be  a measurable  function  m ; n -*•  IN* 

bj^,b2,...  be  a sequence  of  measurable  functions 
E(jbj|)<»  i e IN  b^  ; 0 - IR 

be  the  smallest  a-algebra  for  which  b^  is 
measurable 

8^  be  the  smallest  a-algebra  for  which 

are  measurable 

8^^^  be  the  smallest  a-algebra  for  which 

b,,b_,...,b.  are  measurable 
12  1 


then 
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where 


I Eb^  = a if  P(m^i)>0  ieIN 

m is  independent  of  the  's 
II  the  b^  's  are  mutually  independent 
E b^  » a if  P(m  ^ i)  > 0 i e IN 

(o)  : mCw)  = i}  e 8 
i.e.  m is  a Markov  time 


i e IN 


III  E b^  * a if  Pfm  ^ i)  > 0 i e IN 

E(I  . |8^)  = E(I  .)  a.e. 
m=i ' ^ m=i 

i.e.  the  event  m=i  is  independent  of 
IV  E b^  = a if  P(m  ^ i)  > 0 i e IN 

E(I  18.)  = E(I  .)  a.e. 
i.e.  the  event  Tn<i  is  independent  of  b^ 
V E(b.  I .)  = a ECI  .) 

1 m>i'^  ^ m>i'^ 


. I 
1 

m>i 

m 

E 

b. 

i»l 

1 

m 

E 

i=l 

1^ 

VI'  is  a technical  condition  to  insure  that  E E b. 

[i='l  ^ 

well  defined. 


Proof 


I =*^111  this  should  be  clear; 

II  ssa^III  it  is  enough  to  show  that 

V-B  e 8^  , /q  E(I  .)  dP  = I . dP 
’ ’ ' B ^ mai'^  B m=i 
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III  IV 


or  P({m  : mCu)  = i})PCB)  = P(f(u  : m(u)  = i}nB) 
This  follows  from  the  fact  that  B e 8^  , and 
that  by  II 

{oD  : mfw)  = i}  e 8^^^ 

and  the  's  are  mutually  independent. 

E(I  iB.) 
m<i'  1 


I E(I  .IB.) 
1-1  "‘='3  1 


= I E(I  .)  by  III  because  B.C  8^  for  j<i 


IV=iV 


E(b,  I .■)  = ECECb.  I . |B.)) 
1 m>i-  1 m>i'  ■' 


ECb.  ECi  . |B.)) 

E b.  E(I  .) 

a E(I  .) 

^ m>i-' 


by  IV 
by  IV 


E E b.  E Z b.  I . 
. , 1 . , 1 m>i 

1=1  1=1  — 


Z E(b.  I .) 

i.l  ^ ">1^' 


by  VI' 


E a E(I  ,) 

i=l  ""li 


= a E (m) 

Note  that  we  do  not  need  to  assume  EC|b^|)<oo  and 
VI'  if  PCb^  ^ 0)  = 1 i e , and  if  we  allow 
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the  value  » . 

If  the  's  are  independent  and  identically  distributed,  and 
some  technical  conditions  are  met,  it  is  well  known  that  Is^VI, 
while  III^VI  (also  known  as  Wald's  theorem)  is  proved  at  different 
places  in  jpeller,  1966^.  jooob,  1953 J proves  that  Ilan^VI  . 

The  theorem  given  here  is  very  simple,  and  its  hypothesis 
minimal;  that  IVts^VI  is  somewaht  surprising,  we  give  an  example 
illustrating  it. 


prob. 

m 

‘’2 

>^3 

3/16 

1 

0 

-1 

2 

1/16 

1 

16 

7 

2 

1/16 

2 

0 

-1 

0 

3/16 

2 

0 

7 

0 

4/16 

3 

0 

-1 

0 

4/16 

3 

0 

-1 

2 

We  have  E b^  » E b^  =■  E b^  = 1 


E m a 9/4 

P(m  < 2\b^  =1  -1)  =1  ^ = P(m  < 2) 
P(m<  3|bj=  2)  a!saP(m<  3) 
Thus, surely  enough. 


m 

E Z 
ial 


16  » 


*T6^*I6  ‘16  ‘ 


■ 9/4  a Em  Eb^ 
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Appendix  B 


We  prove  here  a theorem  used  in  Sections  5 and  7 of  Chapter  5 . 

The  method  is  similar  for  both  cases,  we  will  give  the  details  for  Section 
5 and  sketch  the  proof  for  Section  7 . 


We  know  that  if 

{s^,m^}  , i=0,l,.,.  (j 
ergodic;  ^ and  E m^ 
variables 


M 

p < 1 and  if  z E v < « , the  process 
^ j»l 

fixed)  is  Markovian  and  positive  recurrent,  thus 
are  finite.  For  x given,  consider  the  random 


= m^  I 


s £^X 


The  process  is  also  ergodic,  becau.se  if  a set  A of  sequences 

{z^}  is  shift  invariant,  so  is  the  set  A':  * {{s^,mj'}  ; {z^(Sj^,mjje  A} 
A'  has  the  same  probability  as  A,  i.e.  0 or  1. 


Theorem 


The  limit,  as  the  time  increases,  of  the  fraction  f(t)  of 
messages  from  origin  j that  arrived  in  the  queue  during  scanning  times 
of  length  less  than  or  equal  to  x is  almost  surely  equal  to 


1 


eTsT  j 


y dS(y)  . 


Proof:  Denoting  by  a(t)  the  niaiber  of  complete  scanning  times  up  to 
time  t , we  have  that 


r ■ 
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a(t)-l 

Z 

i-0 


I 


a(t)  . 

z mM 


a(t) 

Z 

i-0 


1 S . <X  . n J S . < 


Z 

i-0 


3 


By  the  strong  ergodic  theorem,  the  ratio  of  the  numerators  over  a(t) 

i *x 

goes  with  probability  one  to  E ^ y dS(y)  while  the  ratio 

of  the  denominator  to  a(t)  goes  with  probability  one  to  Xj  Es  • 

Q.E.D. 


Note  that  this  would  be  a well  known  result  of  renewal  theory 
if  the  scanning  times  were  independent,  and  if  the  arrivals  did  not 
interact  with  the  lengths  of  the  scanning  times. 

The  proof  for  Section  7 goes  along  the  same  lines,  the  main 
difference  is  that  the  process  {mj^,S^}  must  be  replaced  by  a process 
of  larger  size,  similarly  as  we  did  for  the  d^  process  in  Section  , 
to  retain  the  Markov  property. 
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Appendix  C 


This  appendix  contains  the  listing  of  FORTRAN  IV  subroutines 
MOHUFF  and  LSEQl  which  implement  respectively  Steps  I and  II  of  the  algo- 
rithm presented  in  Section  4.C  of  Chapter  II. 

MOHUFF  is  a straight  translation  in  FORTRAN  of  the  algorithm 
given  in  Step  I.  It  works  best  when  the  symbols  are  listed  in  order  of 
decreasing  probabilities. 

LSEQl  computes  the  largest  root  of  the  equation  A*(s)  B*(-s)  = 1 
using  the  Newton_  Raphson  algorithm  [Klerer  and  Kom,  1967,  p.  2-59]. 
Because  this  algorithm  works  best  with  functions  whose  ratio  of  the  second 
derivative  to  the  first  derivative  has  small  absolute  value,  the  sub- 
routine computes  the  largest  root  of  the  equation  log  A*(s)  + log  B*^s)=0 

In  lines  14  to  18  the  program  searchtjfor  a starting  point  larger 
than  the  largest  root.  Because  the  Laplace-Stieltjes  transforms  of  pro- 
bability distributions  are  log-convex,  the  sequence  of  values  produced  by 
the  algorithm  from  this  starting  point  will  converge  monotonely  to  the 
largest  root.  The  algorithm  itself  occupies  lines  19  to  28. 

Function  evaluations  take  place  in  lines  33  to  66.  Subroutine 
INTTIM,  which  must  be  provided  by  the  user,  computes  log  A*(s)  and 
^ log  A*(s)  . If  IND  ■ 1,  B*(s)  is  set  equal  to  the  lowerbound  devel- 
oped in  Section  4.B  of  Chapter  II,  and  the  program  computes  s . If 
C “ 

IND  a 2,  B*Cs)  * E p.  e^™  , and  the  program  computes  the  corresponding 

ial  ^ 

s”  . IVhen  m^^  is  constant,  the  same  objective  is  attained  more 
efficiently  by  setting  IND  to  3. 
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